3.1 CNN Entrepreneurship Project Recommendation Algorithm Design for Student Entrepreneurship
Needs
The screening of industry-education-integrated entrepreneurial projects can be extremely
time-consuming and difficult for students. As recommendation systems allow for more
accurate and personalized recommendations because they actively search and analyze
the existing and historical behavioral information of users. Therefore, the research
makes an in-depth analysis of the recommendation system for student-oriented entrepreneurial
projects. First, the historical behavior of the student users can be used to obtain
their rating calculation for a particular entrepreneurial project, as in Eq. (1).
where $r_{ui}$ stands for the student user’s final rating of the item $i$ by the student
user $u$; $a_{i}$ represents the weight of the $j$ interaction dimension; $q_{u,i,j}$
stands for the number of interactions of the item $u$ by the student user $j$ on the
interaction dimension. There was no negative sample because the user behavior data
is implicit feedback data. For this reason, the sampling method proposed by Xiang
Liang was adopted to carry out negative sample sampling [16]. A collaborative filtering algorithm based on PMF was used to decompose the implicit
characteristics of users and projects through the matrix [17]. First, it was assumed that the implicit feature vectors of students and entrepreneurial
projects obey the Gaussian distribution. The calculation is shown in Eq. (2).
where $N\left(u_{i}\left| \mu ,{\sigma }_{V}^{2}I\right.\right),N\left(v_{j}\left|
\mu ,{\sigma }_{V}^{2}I\right.\right)$ is the probability density function; $\mu $
represents mean value; ${\sigma }_{V}^{2}$ represents variance. $U,V$ represent the
implied eigenmatrices of the student users and entrepreneurial projects, respectively;
$u_{i},v_{i}$ are the implied eigenvectors of student users and entrepreneurial projects,
respectively, and $I$ is the unit matrix. Assuming the conditional probabilities of
the observed ratings followed a Gaussian prior distribution, as expressed in Eq. (3).
where ${I}_{i,j}^{R}$ is the indicator, which is 1 when the user $i$ rates the item
$j$ and 0 when there is no rating. The implied characteristic posterior probabilities
of the user and the item were obtained using the Bayesian function as in Eq. (4).
where the implied features of the venture $V$are not obtained by the probability MDA,
but they are calculated using DNN. Assume $W$ is the series of parameters of the DNN,
and it is Gaussian distributed. The feature $X_{j}$$V_{j}$ representing the $j$\textsuperscript{th}
venture is input, and the $j$\textsuperscript{th} venture was calculated using Eq.
(5).
where $dnm\left(W,X_{j}\right)$ stands for DNN, and $\varepsilon _{j}$is the random
error. Assuming that the set of parameters $W$ also conforms to a Gaussian distribution.
The probability distribution of $W$ and the conditional probability distribution of
the item implied feature $V$ is expressed as Eq. (6).
Once the implied feature matrix $V$, the student user implied features $U$, and the
set of DNN parameters $W$ are solved, the probability distribution can be calculated
in Eq. (7).
Startup projects contain considerable characteristic information, both structured
and unstructured data. More useful characteristics can be extracted from this content
by combining the CNN with word embedding techniques and one-hot techniques to build
a CNN, which is used to extract the implicit features of the projects. Fig. 1 presents the DNN structure [18].
Fig. 1. DNN Structure for Extracting the Project Features.
The input layer of this network structure is the cleaned and processed startup project
feature data (Fig. 1), which consists of four parts. The word vector method is mainly used to process
the labels, and the word2vec algorithm is used to train all the label sequences and
obtain the label vectors of the entrepreneurial projects. The trained label vectors
cannot be used directly as input to neural networks because there are multiple labels
for each entrepreneurial project. The study used the label vector summation and averaging
method, assuming the set of labels for each entrepreneurial project is $T=\left\{t_{1},t_{2},\ldots
,t_{l}\right\}$, where $t_{i}\in p^{1\times k}$. Each label is then processed as expressed
in Eq. (8).
where $l$ is the number of labels of the entrepreneurial project and $p$ denotes the
processing result. This study used a CNN to process the text description of the venture,
whose structure is shown in Fig. 2, to optimize the accuracy rate of the implied features of the entrepreneurial projects
[19].
Fig. 2. Structure of Text CNN.
The network structure consisted of seven main components (Fig. 2). First, the embedding layer aims to convert the sequence of word separation results
of a piece of text into a matrix that can be fed into this network, which is then
used as input to the text CNN. The convolutional layer is designed to extract the
local features. Its main purpose is to extract local features of text or images using
local sensory undertakings, shared weights, and biases. The convolution operation
of the CNN for this study is expressed as Eq. (9).]
where ${c}_{i}^{j}$is the activation value on a convolutional kernel; $f$ represents
the activation function of a neuron; ${b}_{c}^{j}$ is the shared bias; $w_{i,j}$ represents
the shared weights; $D$ stands for the input to the convolutional layer. The goal
of the pooling layer is to simplify the output of the convolutional layer, in effect,
a sampling operation. The fusion layer accepts the output of the CNN under study and
directly splices it horizontally with the region. The domain and label serve as input
to the lower layer. The individual features that have been fused need to be extracted
from the abstract characteristics of the items by the fully connected layer, which
is calculated using Eq. (10).
where $f$ stands for activation function; $W_{fc}$ is the set of parameters; $b_{fc}$
is the set of biases. Finally, the output layer is the one that converts the output
of the previous layer into a dimension-specific vector of implicit features of the
items.
The above algorithm is optimized using a maximum posteriori estimation approach with
the optimization objective shown in Eq. (11).
where $\lambda _{U}=\frac{\sigma ^{2}}{{\sigma }_{U}^{2}},\lambda _{V}=\frac{\sigma
^{2}}{{\sigma }_{V}^{2}},\lambda _{W}=\frac{\sigma ^{2}}{{\sigma }_{W}^{2}}$ is the
error function with a regularization parameter. $\lambda _{U},\lambda _{V},\lambda
_{W}$ are the regularization parameters that need to be verified experimentally by
a poorer difference. The optimal $U,V,W$ are obtained by optimizing Eq. (11) for the section class. The study uses the gradient descent algorithm to find the
first-order partial derivative of $U,V$ in this equation and make it equal to 0 to
find the direction of gradient descent as expressed in Eq. (12).
where $W$ is not directly available as a set of parameters for DNN. On the other hand,
when $U,V$ are determined, it can be found by combining DNN with MDA. The implicit
feature $U,V$ also predicts the missing rating information in the rating matrix $R$.
3.2 DNN Recommendation Algorithm based on the Construction of a Recommendation Model
for The Integration of Learning, Production, and Education Entrepreneurship Projects
Suppose only neural networks are used to learn the user’s implicit characteristics
from their content features. In that case, the recommendation algorithm does not consider
the user’s behavioral information, which is equivalent to its degradation to a content-based
recommendation algorithm. In general, the content features of student users tend to
be static and unchanging, and modeling only the content feature information of student
users will inevitably reduce the accuracy of the recommendation results. Hence, the
historical behavior also needs to be modeled. Assuming that the implied features of
the student user and the entrepreneurship project obtained from the decomposition
of the rating matrix are $U,V$, respectively, the predicted rating of the project
by this user can be calculated using Eq. (13).
where $i$ represents a student user and $j$ represents a business venture. The implicit
feature of $u_{i}$ represents the implicit feature of the student user, and $i$ stands
for the implicit feature of the entrepreneurial project. The optimized loss function
of the collaborative algorithm based on the implied features is shown in Eq. (14) [20].
where $r_{i,j}$ is the true rating of the entrepreneurial project $j$ by the student
user. ${\sum }_{i=1}^{n}{\sum }_{j=1}^{n}\left(r_{i,j}-\widehat{r_{i,j}}\right)^{2}$
is the mean square error between true and predicted ratings. $\lambda _{U}\left|\left|U\right|\right|^{2}+\lambda
_{V}\left|\left|V\right|\right|^{2}$ is the regularization term and $\lambda _{U},\lambda
_{V}$ are the regularization parameters. Eq. (14) can be transformed into (15), which represents the implicit feature $U$ as a function
of the student user’s historical rating matrix $R$[21].
where $f$ is a non-linear transformation; $R$ is the scoring matrix; $W$ is the set
of parameters. The implicit interest features of the student user are also influenced
by the user content features, i.e., the implicit user features $U$ are functionally
related to the content features of the student user. Assuming the user content feature
vector is $X$, this can be translated to Eq. (16).
For the student user $i$, the implicit feature vector is expressed as Eq. (17).
CNNs can be used for model building because of their ability to model arbitrary functions
to mitigate the negative impact of data sparsity on the model. The student user ratings
are reduced dimensionally, and the results are fed into the neural network. The RBM
algorithm can reduce the dimensionality of the student user’s rating vector, where
the visible layer represents the students’ rating data, and the hidden side stands
for the reduced dimensional data. The reduced-dimensional rating data is used as an
input to the CNN; Fig. 3 shows the structure of the student user.
Fig. 3. Network Structure Diagram of User Feature Extraction.
In addition to using dimension reduction rating data as network input (Fig. 3), the other characteristics of the student user, such as the student’s region, profile,
and area of interest, can also be used as network input. In summary, the research
proposes a CNN recommendation algorithm based on a DNN recommendation algorithm as
the recommendation model for the student production and education fusion entrepreneurship
project, as shown in Fig. 4.
Fig. 4. Recommended Model of a Student Industry Education Integration Entrepreneurship Project.
From Fig. 4, the left side of this model is the student user network model, and the right is
the entrepreneurial project network model. The entrepreneurial project network structure
uses a CNN to process the unstructured text information. Word embedding and one-hot
are used to process the others. The student's model consists mainly of modeling the
historical behavioral and content features of the student users. For the historical
behavioral information of the student user, the study uses the RBM algorithm for processing.
It then uses it as input to the network. The remainder of the processing is similar
to the content features of the entrepreneurship project.
The network processing of the model was roughly divided into three stages. The first
is the feature and stitching processing, where the student user and entrepreneurial
project features are processed separately. The two are stitched together to generate
the corresponding high-latitude spatial vector representation. It then approaches
the fully connected layer-by-layer abstraction stage, where the high-dimensional spatial
vectors of student users and entrepreneurial projects are again passed through fully
connected layers, each extracting features from student users and entrepreneurial
projects and abstracting them layer by layer. The final output is a vector of implicit
features of the student user and the entrepreneurial project in the same vector space.
The scoring stage is the final of the model, where an MDA can calculate the inner
product of the implied features of the student user and the entrepreneurial project
for the two network outputs, which is the student user score.
The pre-training of the model is training the RBM network, which aims to extract dense
features from the sparse scoring data [22]. During the supervised training phase, the student rating features and other features
are obtained from the pre-training combination and used as input to the student network.
The text features and other features of the entrepreneurial project are used as input
to the project network [23,24]. The calculation of each network layer is expressed in Eq. (18).
where $X_{h}$ represents the activation function of the $h$ layer; $W_{i}$is the set
of parameters; $b_{i}$ represents the bias; $\sigma $represents the activation function.
The study uses the function as the activation function, which is calculated using
Eq. (19).
In the training of the supervised model, the objective function is expressed as Eq.
(20).
where $W_{u},W_{v}$ are used to extract the set of neural network parameters for student
user and entrepreneurial project features, respectively. The study uses the gradient
descent method to obtain $W_{u},W_{v}$, which can calculate the ratings of all student
users on entrepreneurial projects. After ranking from the largest to the smallest,
the first $k$ entrepreneurial projects are selected as recommendation results.