Deep Network Learning based on TF-IDF Text Features for Electric Power Speech Text
Pre-disposal Method
ZhaoXin1,*
HuangChangda1
-
(State Grid Xinjiang Electric Power Co., Ltd Marketing Service Center, Qinyang 454550,
China)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Graph convolutional neural network, Text-based classification, TF_IDF, Electric power equipment, Text data recognition
1. Introduction
The popularization of the Internet has led to the rapid development of areas such
as cloud computing and the Internet of Things (IoT), resulting in an exponential growth
of Internet data. These data cover a variety of forms such as text, audio, video and
pictures, among which text data occupies an important position. Taking power data
sites as an example, the Internet is flooded with a large amount of relevant information
[1,2]. Meanwhile, with the rapid popularization of mobile, diversified social platforms
come one after another, such as WeChat, microblogging, and so on. The rapid growth
of data, on the one hand, brings convenience to people's access to information [3,4], but on the other hand, people need to spend a high cost of time to get the part
they need from a large amount of information. So how to effectively obtain and organize
information has become an urgent problem, and data mining [5-7], information-based retrieval [8-10], and related information-processing technology methods have rapidly gained importance
and development. In the 1950s, H.P. Luhn published a paper [10], that caused a great sensation in the field of text-based classification, and the
paper introduced the idea of word frequency statistics into the research of text-based
classification in a pioneering way. In the 1960s, Maron published a paper entitled
``Automatic Indexing: An Experimental Inquiry'' in the Journal of ACM [11], which had a profound impact on the subsequent research on text-based classification
of search engines. In 1973, Salton [12] and others first proposed the VSM, which uses vectors to represent the feature terms
of the text to be processed, and a new representation of the text according to a specific
theory, and this representation model continued for a long time. Until the end of
the 1980s, text-based categorization continued to be knowledge-engineered, with the
CONSTRUE [13] system developed by the Carnegie Group being based on this technique. Time came the
20th century, the Internet also entered a phase of rapid development, the network
every day to produce data are in the form of exponential growth, at this time, people's
needs were also growing dramatically, and the traditional manual classification had
been unable to meet the gradual elimination of the way. Along with the development
of artificial intelligence [14], machine intelligence learning [15], pattern recognition [16], statistical theory [17], and other disciplines, automatic text-based classification systems have gradually
replaced manual classification techniques, and most of these systems are based on
machine-intelligence learning, and are far more efficient than the human experts,
and also maintain a very high level of accuracy. Therefore, people have actively researched
machine intelligence learning in the field of text-based classification, such as plain
Bayes [18], K-nearest-neighbor [19], neural networks [20], and support vector combiners [21]. With the continuous development of machine intelligence learning, deep network learning
has been studied [22]. Hidden textual features in textual data are not easy to uncover and extract in shallow
neural networks and are very different from human thought patterns. The purpose of
text-based classification is to make the classification process closer to the human
thinking process for classification. Deep network learning is derived from machine
intelligence learning but is more focused on longitudinal, multi-level data mining
and analysis than shallow machine intelligence learning. It has a wide range of applications,
especially in image processing [23] and speech recognition [24].
It is because of the excellent performance of deep network learning in these aspects
that it has been gradually applied to text-based classification in recent years. In
2003, the distributional representation of words was used in statistical language
modeling by Bengio [25]. in 2008, the concept of word vectors was first proposed by Collobert et al. and
later introduced into convolutional neural networks. Google Inc. introduced the Word-2vec127
technique in 2013, which has been widely used in the field of text modeling. Word-vec
trains each word by filtering out high and low-frequency occurrences of words in the
text, combining the contextual information of the target word and representing it
with a low-dimensional vector [26]. It can better represent the relationship between words and express the potential
semantic information of words using low-dimensional vectors. Subsequently, Mikolov
et al. disclosed two methods to compute word [27] vectors, CBOW and Skip-Gram, and accomplished efficient training of text sets using
these two methods respectively [28]. Under the leadership of Wu Jun [29], an automatic Chinese corpus classification system was born in the Department of
Electronic Engineering, at Tsinghua University, it was based on the corpus correlation
coefficient, using word frequency, word frequency, and deactivated word list to remove
non-featured words, and thus classified. In 1999, Zou Tao30 et al. introduced an automatic
classification system for Chinese documents at Nanjing University. In 2000, Li Xiaoli
of the Institute of Computing, Chinese Academy of Sciences (CAS), and Shi Zhongshi
[31] developed a text-based classification system that reached a high level. Then Fanzhong
[32] et al. at CSCU proposed a hypertext coordinated classifier, which utilized KNN as
well as Bayesian algorithms and was effectively processed by text similarity.
2. Electricity Speech Text Data Mining with Graph Convolutional Networks
Meanwhile, in the research aspect of power service in the context of big data, power
grid companies have accumulated massive and diverse power operation data. These data
account for more than 80% of unstructured data, such as audio recordings and text
data. The unstructured data mainly comes from the customer service system of power
grid companies, and its text data contains customer fault reporting, information queries,
business processing, and other business needs [33]. How to make full use of the text data, and an in-depth understanding of the real
needs of customers is of great significance to further improve the level of power
supply and use of electricity service, and improve the user experience of electricity
use. Based on the traditional convolutional network the data mining technology can
not realize the characterization of text data, so the combination of graph convolutional
network text mining technology came into being. Text mining technology combines computer
technology, artificial intelligence algorithms, etc., to realize the extraction of
valuable information in the text [34-36]. At present, the application of text mining in the field of electric power mainly
includes the perception of the state of electric power equipment, the diagnosis of
faults, and the assessment of system reliability [37-39], but its application in the field of electric power operation is less. In this regard,
the paper applies graph convolutional networks combined with text mining technology
to the information processing of text data in power operations, to realize the text-based
classification of power operations. At the same time, it deeply understands the needs
of electric power customers and then improves the company service level of the power
grid.
The main purpose of modeling in this graph convolutional network for power equipment
text data is to extract spatial features in topological space. There are two methods
to extract the features: one is based on convolution in the null domain and the other
is based on convolution in the frequency domain. To explain in layman's terms, the
null-domain type convolution can be analogized to the direct convolution on the pixel
points of a picture, while the frequency domain convolution can be analogized to the
Fourier transform of a picture followed by convolution. The process of seeking power
speech text convolution integral can be described as follows: first through the signal
sampling theorem will be the input signal, decomposed into impulse functions, and
then find the impulse response of each impulse function in the system, in the sum
of these impulse responses to get the input signal and then the response of the zero
states of the system, the Eq. (13), the Eq. (14) indicates that the function of the
null-domain convolution is:
The basic idea of the preliminary design of the graph convolutional neural network,
and a concrete representation of the key processes are shown in Fig. 1. By iteratively realizing the steps of convolution until the desired number of layers
is achieved, the function of the local output and the function of the target output
of the graph convolutional neural network are obtained.
In addition, the graph convolutional neural network is applied to text-based classification
to solve the TF-IDF text-based classification, the degree of classification accuracy
is mainly dependent on the input of word vectors, and some word vector inputs do not
take into account the important correlation information between word items and between
words and documents, so the graph convolutional neural network is introduced to solve
this problem. In this study, the main steps in text feature extraction are first based
on the data mined to capture the text data on the Internet. After the data power speech
text pre-disposal, the cluttered unstructured text is transformed into structured
data, using a combination of supervised learning and unsupervised learning methods
for text feature value similarity calculation and extraction, to determine the optimal
text feature extraction shown. However, the number of nodes' collar nodes is not fixed,
and the node features in the graph cannot be extracted directly using the traditional
convolution kernel. The most important thing is to find the association relationship
existing in the text information, to construct the graph vector of TF-IDF. According
to the above algorithmic model, the algorithmic flow of power operation information
processing based on TF-IDF-LSTM is designed. The raw text of the electric power operation
is taken as input, and then data pre-disposal operations such as text cleaning and
text segmentation are carried out. The extraction of text data features is further
realized based on the TF-IDF algorithm. Finally, the classification and recognition
of power operation text are realized by a deep classification model. The convolution
process captures the local structural and semantic information in power speech text
and helps to understand the intrinsic patterns of the data. While matrix transformation
converts the raw text data into a matrix form suitable for convolution operations,
which is crucial for the representation of node attributes and connectivity relationships.
The combination of these two methods improves the model's ability to recognize and
classify power operation text data, and provides new ideas to solve the problem of
lack of effective application of high-power operation text data. The TF-IDF-based
graph convolutional neural network text-based classification process for power equipment
topics is shown in Fig. 2, which is mainly divided into two parts: the text feature extraction method of the
Labeled-LDA model, and the text-based classification model of graph convolutional
neural network.
Step 1: Power speech text pre-processing; Assuming that three collections with the
same m training documents are D, = \{d, d2, d.), D, = (d, d2, d), D, = (d, dd), D,
= (d, d, d), D, D, D, for the power speech text pre-processing work of segmentation,
de-duplication, etc., and D, D, D for the segmentation, de-duplication, etc., after
splitting each document paragraph into single sentences. Power speech text preprocessing
work.
Step 2: The TF-IDF algorithm is input into the Labeled-LDA model algorithm model to
get the feature matrix of the topic label, after which the construction of the graph
vector in the power speech text is carried out.
Step 3: A graph network structure is constructed according to the electric power recognition
method described, input into a graph convolution type neural network model, and after
iterative training a text feature matrix is obtained, and graph recognition and classification
are disposed of.
Step 4: The topic label feature matrix v, and the text feature matrix v, are spliced
to obtain the multi-source fusion feature coefficients, after which the local output
and target output are performed. Input the multi-source fusion features into the Soft-max
classifier to get the classification results, and finally get the electric power speech
text recognition results
Among them, the core idea of TF-IDF is that for a word that appears in a certain text
data at a high frequency, the word appears less frequently in other text data in the
total power speech text sample. Then it can be considered that the word has a strong
distinguishing ability for the power speech text sample and can be used as a classification
label for the text data. Therefore, the TF-IDF algorithm uses the product of word
frequency and inverse document frequency as weights, which are calculated as follows.
where n, is the number of occurrences of word i in text j; and the summation term
is the total number of all words in text j. The IDF describes the inverse of the frequency
of occurrence of the word I in other texts and is calculated as follows:
where D is the total number of power speech text samples and $\left\{j\colon i\in
j\right\}$ is the number of texts containing the word i. The denominator of the TF-IDF
text data extraction features is shown in Fig. 3. To avoid the situation that the denominator is zero because all the power speech
text samples do not contain word i, 1 is usually added to the base of. The specific
TF-IDF text data extraction features are shown in Fig. 3.
Fig. 1. Convolutional process and matrix transformation of graph convolutional neural
network.
Fig. 2. TF-IDF text-based classification modeling flow for graphical convolutional
neural networks.
Fig. 3. TF-IDF Text Feature Extraction with Graph Convolutional Neural Networks.
3. Experimental Results and Analysis of Electric Power Speech Text Data Being Mined
3.1 Preparation of the Experiment
All the experiments of the power speech text testing in this study were done on a
computer with 128-bit Windows_10 operating system, and the hardware configuration
of the computer is: Intel Core i7, 3.4GHz, dual-core four-thread CPU16.00GB RAM/256GB
SSD The algorithmic code of the experiments in this paper was realized on the jury-er
platform, and Python-3.6 was used as the development language to store the data, and
the database version is My_SQL5.5. Python-3.6 as the development language, using 2010
Excel and My_SQL relational database for data storage, where the database version
is My_SQL5.5 and Navi_cat software for visualization access.
The Lageled-LDA model proposed by this experiment [23] is compared with the performance of traditional TF-IDF, LDA topic model for text
feature extraction to validate the effectiveness of the algorithm proposed in the
previous section, the experimental process to ensure the validity of the period, and
part of the core algorithm is coded in the following Table 1.
Table 1. Implementation Flow of Some Core Codes.
Code Implementation Flow
|
from sklearn.feature_extraction.text import TfidfVectorizer
# Define the text data
documents = [
'This is the first document.' ,
'This is the second document.' , 'This is the third document.
'This is the third document. The third document contains some repeated words.',
'The fourth document is very similar to the third.']
# Initialize the TFIDF vectorizer
vectorizer = TfidfVectorizer()
# Transform text data into TFIDF feature vectors
tfidf_matrix = vectorizer.fit_transform(documents)
# Output IDF values for each word
print('IDF values for each word:')
print(vectorizer.idf_)
# Output the shape of the TFIDF feature vector
print('Shape of TFIDF feature vector:')
print(tfidf_matrix.shape)
# Output the TFIDF feature vector
print('TFIDF feature vector:')
print(tfidf_matrix.to array())
|
3.2 Study of Experiment 1
Based on the LDA theme model to extract the power setup subject data mining experiments,
there have been most of the literature [29,30] will be set in the model parameters of a and B: a = 50/k, B = 0.01; k is the number
of implied themes, according to the application scenarios and the actual situation
to do the corresponding adjustment.
In Experiment 1 to realize the TF-IDF extracted keywords and LDA extracted power setting
subject data matching, where the data category is 6 set the number of topics k for
6, Gibbs Sample in the number of iterations for 600 times, Table 2. for the LDA topic model-word recognition results example, where, Topic_2, Topic_3,
Topic_4, Topic_5, Topic_6 are the topic model numbers recognized by LDA. The results
after TF-IDF computation are shown in Table 3.
Through Sim-hash analysis of TF-IDF weight (data set words appear in the order of
the size of the frequency of the keywords) the top 100 keywords and the LDA weight
of the top 100 power set subject data similarity, the results are shown in Fig. 4. It can be seen that the effective degree of recognition classification is high.
To verify the performance of the proposed algorithm in experiment one: based on the
plain Bayesian classifier, the performance of the three feature selection algorithms
is evaluated by the degree of accuracy of classification, the recall proportion and
the F1 metrics, and the three algorithms of text feature extraction, namely, the TF-IDF,
the traditional LDA topic model and the Labeled-LDA model, are compared. Among them,
Fig. 5(a) shows the comparison of accuracy degree, Fig. 5(b) shows the comparison of recall proportion and Fig. 5(c) shows the comparison of the F1 value of the three feature extraction methods.
The method used in Experiment 1 improves the accuracy, recall ratio, and F1 value,
for example, the F1 value in the feature extraction proposed by the Labeled-LDA model
is higher than that of the TF-IDF and traditional LDA topic model respectively, so
the F1 value of the improved LDA topic model is higher. Overall, the accuracy of text
feature extraction based on the Labeled-LDA model is higher than that of the traditional
LDA theme model and TF-IDF feature extraction. Through the fusion algorithm of the
traditional LDA topic model and TF-IDF, TF-IDF is used as an additional label of the
LDA category, which can effectively determine the feature topic, so the text feature
extraction method proposed in the previous section is more effective and stable.1.82%
and 3.92%, in which the traditional LDA theme to extract the power setting subject
data mainly relies on the full probability unsupervised model.
Fig. 4. Extraction of validity degree for TF-IDF weight recognition classification.
Fig. 5. Comparison of extraction results of features under three different methods.
Table 2. LDA Identification Results.
thematic
|
Power equipment data identification number X (Xy)
|
Topic_1
|
M (28)
|
C (11)
|
D (89)
|
E (25)
|
A (28)
|
Topic_2
|
F (21)
|
A (71)
|
L (11)
|
N (11)
|
B (35)
|
Topic_3
|
X (13)
|
J (12)
|
H (11)
|
K (38)
|
E (11)
|
Topic_4
|
F (41)
|
G (33)
|
J (11)
|
L (08)
|
F (39)
|
Topic_5
|
A (51)
|
S (34)
|
D (11)
|
O (48)
|
R (45)
|
Topic_6
|
D (16)
|
C (56)
|
B (11)
|
A (47)
|
T (751)
|
Table 3. TF-IDF Results for Identifying Critical Data Areas.
thematic
|
Key data area X (X)
|
major category identification
|
M (2)
|
C (1)
|
D (8)
|
E (2)
|
A (2)
|
F (2)
|
A (7)
|
L (1)
|
N (1)
|
B (3)
|
X (1)
|
J (1)
|
H (1)
|
K (3)
|
E (1)
|
F (4)
|
G (3)
|
J (1)
|
L (0)
|
F (3)
|
A (5)
|
S (3)
|
D (1)
|
O (4)
|
R (4)
|
D (1)
|
C (5)
|
B (1)
|
A (4)
|
T (7)
|
3.3 Experiment 2 Study
In Experiment 2, the association relationship between word items, words, and documents
is mined, graph vectors are constructed, and the application of graph convolutional
neural network in text-based classification is realized, and the accuracy of the graph
convolutional neural network text-based classification model is tested by changing
the proportion of the training set group, the window size, and word embedding dimensions,
and the recall proportion and the F1 value prove the reliability of the algorithm.
The graph convolutional neural network model is applied to text-based categorization
data mined in the experiment, according to the literature [43], the convolutional layer in the model Text-GCN is set to 2, the learning rate is
set to 0.03, the dropout is set to 0.5, and the loss function canonical parameter
is 0.
In Text-GCN experiments, the dimension of word embedding in the input layer by changing
the proportion of the training set, the window size, and the word embedding is one
of the most important hyperparameters in the model, and if the dimension is not chosen
correctly, it will produce the overfitting phenomenon. In this experiment, the dimension
of the word embedding is carried out in increments of 50 at a time with 50 as the
base, and the experimental results are shown in Fig. 6(a). Fig. 6(a) represents the effect of different dimensions of word embeddings in the TextGCN input
layer on the classification performance, where the horizontal coordinate represents
the dimension of word embeddings and the vertical coordinate represents the evaluation
index. Analyzing Fig. 6(a), it can be obtained that the accuracy level rises slowly with the increase of dimension
size, and when the number of dimensions is 300, the accuracy level tends to be about
70 percent. This experiment can conclude that word embeddings with too low dimensions
do not propagate the text information to the whole graph well, while word embeddings
with high dimensions do not improve the classification performance and take more training
time. In the word co-occurrence model, the size of the window for word scanning has
an important impact on learning the correlation between word items. In this experiment,
the window of scanning is incremented by 2 at a time for the experiment, and the experimental
results are shown in Figs. 6(b) and 6(d). The results represent the accuracy of text-based
classification under different numbers of windows, where the horizontal coordinate
represents the size of the window scanning and the vertical coordinate represents
the evaluation index of text-based classification. From Figs. 6(b) and 6(d), it is
observed that when the scanned window increases with the size of the window number,
the degree of accuracy rises slowly and levels off when the window number is 6. This
reflects that the number of windows scanned is too small to capture the co-occurrence
information between words, but if the window scanned is too large the less correlation
between words. Under the condition that the window size of word scanning and word
embedding dimension of the word co-occurrence model remain unchanged, the ratio of
the training set is varied to test the accuracy, recall ratio, and F1 value of the
text-based classification. Fig. 6(c) shows the effect of different proportions of training collection groups on the accuracy
degree of text-based categorization, the horizontal coordinate indicates the proportion
of training collection groups, and the vertical coordinate indicates the text-based
categorization index. From Fig. 6(c), it can be observed that the text-based classification accuracy is highest when the
proportion of the training ensemble group is 75%. It is further illustrated that the
text-based classification model of graph convolutional neural network achieves high
accuracy classification with limited category labeled documents, and the text-based
graph vectors can better capture the text category information.
Fig. 6. Classification features for text recognition under three different methods.
3.3 Study of Experiment III
The effectiveness of the topic model text-based classification algorithm for graph
convolutional neural networks is realized through Experiment III. In this experiment,
the topic category label matrices generated from Experiment I and the text feature
matrices generated from Experiment II are fused with multiple sources to achieve text-based
classification. Experiment 3 Parameter selection is performed based on the parameters
corresponding to the optimal experimental results produced by Experiment 1 and Experiment
2. Classification experiments are performed by combining the TextGCN text-based classification
model with several other classification models under the same dataset.
The experimental results are analyzed in Fig. 7. The accuracy of the text-based classification model combining Labeled-LDA and TextGCN
is 76.4%, which is higher than that of the TextGCN classification model and the combination
of Labeled-LDA and Softmax classification model. There are three main reasons: 1)
the construction of a graph structure with textual features can accurately capture
the relationship between words, words, and documents for text-based classification;
2) word nodes can be used as a bridge to not only collect the category information
of the text but also transfer the text category information to the neighboring nodes
of the word node so that the textual information is propagated to the entire graph
network structure; 3) the subject category labels and the textual features with the
information of the words, words, and documents are spliced together with the textual
features with the information of the words, words and documents. and document information
of the text feature splicing of the multi-source feature fusion matrix complements
the TF-IDF text feature matrix with the topic category labels. It is sufficiently
shown that the text-based classification method of extracting text features by the
Labeled-LDA model and then fusing them with multi-source features of graph convolutional
neural network is very effective.
The traditional LDA combined with Softmax has the lowest accuracy of 66.1% for the
text-based classification model. However, the text-based classification model of Word-2vec
combined with TF-IDF has the highest accuracy of 81.5% among the six models. The main
reason is that Word-2vec is a model that generates word vectors by constructing relationships
between context and target words, which contains both CBOW and Skip-Gram modes. When
TF-IDF is input, Word-2vec is utilized to train a large-scale corpus to generate word
vector representations with contextual information and target words, which has good
performance in text-based classification. Through the experiments on the text dataset
of power data, the experimental results show that the accuracy degree of the topic
model text-based classification based on graph convolutional neural network is 76.4%,
the recall percentage is 75.2%, and the F1 value is 75.8%, which is 3% higher than
the accuracy degree of the graph convolutional neural network text-based classification
method, 3.4% higher than the recall percentage, and 3.2% higher than the F1 value.
Labeled-LDA model textual feature extraction method of text-based classification accuracy
increased by 3.5%, recall ratio increased by 1%, and F1 value increased by 2.3%, proving
that the method proposed in this paper, TF-IDF -graph CNN method, can effectively
improve the accuracy of text-based classification and recognition of power speech.
In addition, as shown in Fig. 8, the complex textual data information generated in electric power equipment can be
well recognized and classified by the method of this paper, and it can be seen that
the overall trend of the data and the peaks and valleys of the data are in good agreement,
thus proving the accuracy and efficiency of the method of this paper.
Fig. 7. Classification features for text recognition under three different methods.
Fig. 8. Schematic of recognition results with different complex text data.
4. Conclusions and Discussions
A graph convolutional neural network processing method including electric power speech
text data responsible for text analysis is proposed here. The details are as follows.
(1) we propose a method for processing power speech text data using graph convolutional
neural networks. The original text is first cleaned and segmented, and then classified
and recognized by a deep classification and recognition model. The effectiveness of
the method is experimentally verified on the electric power data text dataset, and
the results show that the classification accuracy is highest when the training set
is 75%. This indicates that the text-based graph convolutional neural network classification
model can achieve high-accuracy classification under the condition of limited category-labeled
documents, and the text-based graph vectors can better capture the text category information.
The method provides a new idea for power speech text data processing and helps to
improve the intelligence level of power system.
(2) The accuracy of the TF-graph convolutional neural network-based text-based classification
for topic model is 76.4%, the recall ratio is 75.2%, and the F1 value is 75.8%, which
is 3% higher than that of the graph convolutional neural network-based text-based
classification method, 3.4% higher than that of the recall ratio, 3.2% higher than
that of the Labeled-LDA model-based text feature extraction method, and 3% higher
than that of the Labeled-LDA model-based text feature extraction method. type classification
with a 3.5% improvement in accuracy, a 1% improvement in recall ratio, and a 2.3%
improvement in F1 value. In addition, the method in this paper can recognize and classify
the complex textual data information generated in electric power equipment, and it
can be seen that the overall trend of the data and the peaks and valleys of the data
are in good agreement.
Currently, graph convolutional neural network models for power speech text data processing
face challenges, including extracting key information, handling heterogeneous data,
and improving generalization capabilities. Future research can explore the combination
of advanced techniques and optimization algorithms to enhance the model performance
and consider practical applications to improve the intelligence of power systems.
REFERENCES
Zhou R S, Wang Z J. A Review of a Text Classification Technique: K-Nearest Neighbor[C]//
International Conference on Computer Information Systems and Industrial Applications.
2015.
Mukherjee I, et al. An Improved Information Retrieval Approach to Short Text Classification.
International Journal of Information Engineering and Electronic Business, 2017, 9(4):31-37.
Wang J, Li L, Ren F. An improved method of keywords extraction based on short technology
text. Faculty, 2010.
Wang D, et al. Retrieval Methods of Natural Language Based on Automatic Indexing.
International Conference on Computer \& Computing Technologies in Agriculture. Springer
International Publishing, 2016:346-356.
Chi XX. Research of Information Filtering Model Based on BP Artificial Neural Network
and Genetic Algorithm. International Conference on Natural Computation. IEEE, 2010:
1788-1791.
Huang C, Trabelsi A, Qin X, et al. Seq2Emo for Multi-label Emotion Classification
Based on Latent Variable Chains Transformation. 2019.
Sundus K, Al-Haj F, Hammo B. A Deep Learning Approach for Arabic Text Classification[C]//2019
2nd International Conference on new Trends in Computing Sciences (ICTCS). 2019.
Zhang, et al. Detecting hate speech on Twitter using a convolution-GRU-based deep
neural network. ESWC 2018:745-760.
Liu D, Shi T, Didonato J A, et al. Application of genetic algorithm/k-nearest neighbor
method to the classification of renal cell carcinoma. IEEE, 2004.
Bolshoy A, et al. Mathematical Models for the Analysis of Natural-Language Documents.
Genome Clustering. Springer Berlin Heidelberg, 2010: 23-42.
Debra, et al. A Framework for Evaluating Automatic Indexing or Classification in the
Context of Retrieval. Journal of the Association for Information Science and Technology,
2016, 67(1): 3-16.
Salton G, Yang C S. On the specification of term values in automatic indexing. Journal
of Documentation, 1973, 29(4): 351-372.
Dong L, Leland R P. The adaptive control system of a MEMS gyroscope with time-varying
rotation rate. IEEE, 2005.
Svetlana Kiritchenko S M. Email Classification with Co-Training. Proceedings of Cascon,
2001:301-312.
Mitchell T M. Machine learning. McGraw-Hill, 2003.
Feng G, et al. Feature subset selection using naive Bayes for text classification.
Pattern recognition letters, 2015, 65(NOV.1): 109-115.
Deng Breaking, et al. A text-based classification method based on statistical distribution
and set theory. Journal of Beijing Institute of Technology, 2006(07): 589-592+597.
Qiang G. An Effective Algorithm for Improving the Performance of Naive Bayes for Text
Classification[C]//Second International Conference on Computer Research \& Development.
IEEE, 2010.
Trstenjak B, et al. KNN with TF-IDF based Framework for Text Categorization. Elsevier
Ltd, 2014:1356-1364.
Meirong Wang, Text-based classification algorithm based on convolutional neural network.
Journal of Jiamusi University (Natural Science Edition), 2017, 036(003): 354-357.
Costales J A, Tuquero A C B, Nolia N V, et al. The Development of Mobile-Based Symptom
Analysis for Early Detection of Diseases Using Hyper-Tuned C-Support Vector Classification
Algorithm[C]//2023 5th International Conference on Control and Robotics (ICCR).0[2024-03-21].
Rodriguez-Cristerna A, Guerrero-Cedillo C P, Donati-Olvera G A, et al. Study of the
impact of image preprocessing approaches on the segmentation and classification of
breast lesions on ultrasound[C]//2017 14th International Conference on Electrical
Engineering, Computing Science and Automatic Control (CCE). IEEE, 2017.
Zhong S H, et al. Bilinear deep learning for image classification. Proceedings of
the 19th International Conference on Multimedia, 2011:343-352.
Kuniaki, et al. Audio-visual speech recognition using deep learning. Applied Intelligence,
2015, 42(4):722-737.
Bengio, et al. A Neural Probabilistic Language Model. Journal of Machine Leaming Research,
2003, 3:1137-1155.
Collobert R, Weston J. A unified architecture for natural language processing: deep
neural networks with multitask learning. Machine Learning. Proceedings of the Twenty-Fifth
International Conference (ICML 2008), 2008:160-167.
Mikolov T, et al. Efficient Estimation of Word Representations in Vector Space. Proceedings
of the International Conference on Learning Representations, 2013:1-12.
Xue Chunxiang, Zhang Yufang A review of research on Chinese text-based classification
for power data domain. Library and Intelligence Work, 2015, 057(014):134-139.
Wu Jun, et al. Automatic classification of Chinese corpus. Journal of Chinese Information,
1995, 9(4):25-32.
Shiwu X, Juan Y, Xia W. Design and implement of urban land classification and evaluation
information system based on data center[C]//2010 The 2nd Conference on Environmental
Science and Information Application Technology. IEEE, 2010.
Chun C, Xiaonan W, Yanling L. Design and realization of a DNA sequence classification
system based on support vector machines. Journal of China Agricultural University,
2005.
Fan Yan, et al. Performance study of hypertext coordinated classifier. Computer Research
and Development, 2000, 37(9): 1026-1031.
Ferris G R. Method of Storing Data Used in Backtesting a Computer Implemented Investment
TradingStrategy:US11718751[P].US20070244788A1[2024-03-21].
Li C, Jian S, Min Z, et al. Multi-scenario Application of Power IoT Data Mining for
Smart Cities[C]//2019.
Ai M A M, Chen N C N, Ge X G X, et al. A CEP based ETL method of active distribution
network operation monitoring and controlling signal data. IET, 2016.
Ren Q, Zhuo X. Application of an improved K-means algorithm in gene expression data
analysis[C]// International Conference on Systems. IEEE, 2011.
Yang Dan, Zhu Shiling, Bian Zhengyu. Application of improved K-means-based algorithm
in text mining. Computer Technology and Development, 2019, 29(4):68-71.
Wang H, Wang H, Jiang L, et al. Research and application of improved K-means based
on MapReduce. Journal of Physics Conference Series, 2020,1651:012074.
Uckol H I, Ilhan S, Ozdemir A. Partial Discharge Pattern Classification based on Deep
Learning for Defect Identification in MV Cable Terminations[C]// 2020 IEEE International
Conference on High Voltage Engineering and Application (ICHVE). IEEE, 2020.
Xin Zhao received a Bachelor of Engineering degree from Liaoning University of Engineering
and Technology in 2016, and currently works at State Grid Xinjiang Electric Power
Co., Ltd Marketing Service Center be in office Special person in charge. His research
interests include Channel Management, Big data analytics, Industrial Economy and Project
Management.
Changda Huang received a Bachelor's degree in Engineering from North China Electric
Power University in 2016, and currently works at State Grid Xinjiang Electric Power
Co., Ltd Marketing Service Center be in operation supervisor. His research interests
include High quality service, Channel Management, Big data analytics, Industrial Economy.