Mobile QR Code QR CODE

2025

Reject Ratio

81.5%


  1. (School of New Generation Information Technology Industry, Shandong Polytechnic, Jinan, 250104, China)



Advertisement, Aesthetic appreciation, Convolutional neural network, Data aggregation, Training, Evaluate

1. Introduction

Advertising is a form of propaganda that disseminates information to the public through various media channels to serve specific needs [1]. With advances in science and technology and the ongoing evolution of human aesthetics, society has entered an era of information aesthetics, in which visual media influence people to a greater degree [2]. Some enterprises need to use print ads as the support of enterprise publicity, so as to ensure that print ads have a better publicity ability [3- 5]. The effectiveness of advertising design (AD) lies in its visual representation. Aesthetically superior graphic design can evoke visual resonance in people [6]. The traditional image aesthetic assessment (IAA) method for AD mostly relies on manual review, which presents issues such as strong subjectivity and inconsistent assessment criteria [7]. The image aesthetic evaluation methods based on deep learning are mainly divided into two categories: one is based on image features, and the other is based on image content. The evaluation method based on image features mainly focuses on the visual features of the image, such as color, texture, shape, etc., and evaluates the aesthetic quality of the image by extracting and analyzing these features. Although the image aesthetic evaluation method based on deep learning can solve the subjective problems of traditional methods to a certain extent, its accuracy and reliability are still not ideal. In addition, how to deal with the differences in different cultural backgrounds and aesthetic preferences is also a problem that needs to be solved in the current research [8]. To solve these problems, a double-layer neural network model based on human visual aesthetics is proposed. The model integrates the knowledge of human visual perception into the structural design of CNN network, and learns the aesthetic rules related to human perception through two sub-networks. The model training strategy was established by data aggregation to further optimize the model performance. The aim of this study is to provide a more objective and scientific aesthetic evaluation method for the advertising design industry. The study's novelty lies in proposing a two-layer neural network based on human visual aesthetics and implementing it for IAA. In addition, the study introduces a unique training strategy that measures image aggregation to target model training. The study comprises four parts. The first part is the literature review, which investigates the application of intelligent technology in AD and examines both domestic and international research related to the technology. This section will determine the research direction. The approach, which comprises the steps involved in building the model and the methods employed in the investigation, is the second section. With the goal to prove the research institute's built model's supremacy in IAA, the third section examines and contrasts its performance. The fourth section serves as a conclusion, summarizing the previous sections and outlining future work.

2. Related Works

Print advertisements are visual images produced in a specific design that caters to the needs of the advertiser. As people's aesthetics evolve, scholars have shifted their focus towards evaluating the quality of these visuals and their aesthetic appeal. To address the subjectivity problem of existing IAA, Zhu H et al. proposed an IAA method based on binary gradient optimization meta-learning. The method was trained directly using personal aesthetic data and generalized quickly to unknown users. The method used aesthetic data from many users to update the learner model by supporting the bi-level gradient of the query machine [9]. To realize automatic assessment of aesthetic quality of images, Yan W et al. proposed a semantic-aware multi-task CNN to evaluate the aesthetic quality of images. Under the combined supervision of the semantic classification task and the image aesthetic quality evaluation task, the network was able to produce a more thorough and accurate aesthetic representation through multi-task learning [10]. Xiang X et al. designed a neural network to jointly categorize images based on style, and at the same time give a distribution of aesthetic ratings. In the process of training the classifier, the angular-soft maximum loss was applied to the single-label training data [11]. Niu Y et al. proposed an image aesthetic evaluation method based on review-guided semantic perception, aiming at the current situation that the existing image aesthetic evaluation methods rely too much on the visual features of images and ignore the rich semantics of images. The semantics of images are first modeled as the subject features of their corresponding comments using latent Dirichlet assignment. Then, a dual-flow multi-task learning framework for subject feature prediction and aesthetic score distribution prediction is proposed. Experiments show that this method is superior to current advanced image aesthetic evaluation methods [12]. In order to evaluate the aesthetics of interface design, Wang W et al. proposed an improved gray H-convex correlation model and used ICRITIC method to study the mapping relationship between interface layout aesthetics and visual cognitive features. The results show that the evaluation accuracy of this method reaches more than 90% [13].

CNN is a deep learning technique, which can automatically extract features from images and perform classification and recognition by learning a large amount of image data. Its application areas are very wide. In order to achieve more effective image denoising, Ilesanmi A E et al. studied the centralized CNN image denoising methods, and classified and analyzed different CNN image denoising methods. The motivation and principles of CNN methods were analyzed by reviewing and describing them graphically [14]. Cao X et al. suggested a new deep learning technique with the dual goals of decreasing the annotation cost and enhancing CNN's performance in hyperspectral image categorization. This approach combined deep learning and active learning into one cohesive framework. The most informative pixels were chosen for labeling during the CNN's initial training phase, which used a small number of labeled pixels. To keep training the model, the markers were combined to create a new training set [15]. Van S P et al. attempted to develop a new one-dimensional CNN network to be used for rainfall runoff modeling. The modeling process applied two convolutional filters in parallel to separate time series. The model was evaluated using measured data from weather stations and the evaluation results showed that the model can effectively learn the dependencies between sequence classes and sequences [16] To achieve higher accuracy in brain tumor diagnosis, Irmak E utilized CNN for multi-classification of brain tumor images. A grid search optimization algorithm was used to automatically specify the three constructed CNN models for multiclassification of brain tumor images. The outcomes demonstrated that the method achieved good classification results in public clinical dataset tests [17]. Rescue teams can find victims more quickly by using the face detection technique that Wieczorek M et al. devised for dangerous scenarios. In order to identify faces in hazardous situations including mines, avalanches, and underwater habitats, the model made use of a lightweight CNN architecture. The results demonstrated that the method achieved a detection accuracy of more than 99% [18].

The literature synthesis reveals that CNN has superior performance for both image processing and feature extraction. Nevertheless, existing CNN-based IAA methods encounter challenges such as inadequate data utilization and unreasonable feature detail extraction. Therefore, this study enhances the CNN and develops a novel IAA model for AD to serve as a scientific reference for ADers.

3. Methods

To evaluate image aesthetics in advertising design, an aesthetic evaluation model based on CNN and data aggregation is proposed. Firstly, a binary CNN network is proposed, and the first subnetwork extracts the image features. The second network is a multi-scale information subnetwork. Visual Geometry Group and RESNET-50 are selected as feature extraction networks, and several small convolutional networks and ResNet are used as multi-scale information fusion networks. After establishing the aesthetic evaluation model, the research found that the accuracy training result of the model was not ideal during the training process, so a new training optimization method was established based on data aggregation to further improve the performance of the aesthetic evaluation model.

3.1. CNN-based Image Aesthetic Assessment for Advertising Design

Aesthetic evaluation of graphic advertising design images can further improve the aesthetic quality of advertising design. Therefore, this paper proposes an aesthetic evaluation model of advertising design images based on CNN and data aggregation. In this paper, a convolutional network based binary path image aesthetic evaluation network is proposed. Visual Geometry Group and RESNET-50 are selected as the feature extraction network, and multi-scale convolution layer and ResNet are selected as the multi-scale information fusion network. The aesthetics of advertising design were evaluated through these two networks. After the model is established, a model training optimization method based on data aggregation is proposed to further determine the model optimization direction and improve the aesthetic evaluation accuracy. CNN is a deep learning model that is particularly suitable for processing image data. It extracts features from images through multilayer convolution and pooling operations and performs classification or regression tasks through fully connected layers [19, 20]. Therefore, the study proposes a CNN-based IAA method for AD as shown in Fig. 1.

Fig. 1. Aesthetic evaluation method of advertising design image based on CNN.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig1.png

The IAA method for AD of CNN employs two path sub-networks aimed at extracting features of the image and modeling them for aesthetic assessment of the image for AD. The first sub-network is the region of interest sub-network. In order to extract the features of these regions more flexibly, the study determines the degree of interest of a local region based on the information density of the region, and a region with higher information density may be more appealing. The study designed a network to model multiple local regions with high information density by extracting them. This network can be trained end-to-end without manual labeling, thus avoiding the interference of subjective noise. The second path sub-network is a multi-scale information sub-network designed to provide rich and diverse global descriptive features to further enhance the performance of the model. The study combines shallow and deep features through a multi-layer information fusion network structure to support the decision-making process. Finally, the decision results from the two networks are fused to produce a final decision judgment. This method can effectively assess the aesthetic quality of AD images with high accuracy and reliability. The prediction function for AD image quality assessment in the region of interest sub-network is shown in Eq. (1).

(1)
$ \phi = P(\hat{y}^{(i)} | Z^{(i)})P(Z^{(i)} | F^{(i)}). $

In Eq. (1), the prediction function is $\phi$, and the output conditional probability distribution is $P(\hat{y}^{(i)} | Z^{(i)})$. The prediction result is $\hat{y}^{(i)}$, the output region of interest variable feature is $Z^{(i)}$, and the deep learning feature vector of the image object region is $F^{(i)}$. The region of interest subnetwork structure is shown in Fig. 2.

Fig. 2. Region of interest subnetwork structure.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig2.png

In the design of the region of interest sub-network, the study chooses Visual Geometry Group (VGG) and Residual Network-50 (ResNet-50) as the feature extraction network, which have excellent performance in the field of image recognition. The VGG network can capture the fine features of the image through its deep convolutional layer, while the ResNet-50 solves the problem of gradient disappearance in deep network training through residual connections, making the network deeper and thus extracting richer features. The local features and global features of images are extracted through these two networks respectively. Meanwhile, the study employs multilayer perceptron as a predictive classifier for image aesthetics in order to achieve high-precision evaluation of image aesthetics. The center point in the image object is calculated as shown in Eq. (2).

(2)
$ \{c_j\}_{j\in[1,n_1+n_2]} = \{\{c_{BBi}\}_{i\in[1+n_1]}; \{c_{SALi}\}_{i\in[1,n_2]}\}. $

In Eq. (2), the centroid of the object region is $\{c_j\}$, the centroid detected by the target is $\{c_{BBi}\}$, the centroid of the connected subgraph computed by the binarized mapping map is $\{c_{SALi}\}$, and the sum of the number of region points is $n_1+n_2$, $n_1$ and $n_2$ are the number of region points entered by the two convolutional networks, respectively. The final set of subject regions obtained after the introduction of the anchor point mechanism is shown in Eq. (3).

(3)
$ s\{c_j\}, j \in [1, n_1 +n_2]. $

In Eq. (3), the set of subject regions is $s\{c_j\}$, the number of anchor boundaries is $k$, and the number of sets of subject regions is $k \times (n_1 + n_2)$. the cost function for semantic evaluation is shown in Eq. (4).

(4)
$ L = \alpha \sum_{i=1}^{M} \left| \frac{S(p_i)}{\omega_i \times h_i} \right| + \beta \sum_{i=1}^{M} |\omega_i \times h_i| - \gamma \sum_{i, j=1, i \neq j}^{M} H(p_i, p_j). $

In Eq. (4), the number of regions contained in the subject region set is $M$, one of the regions is $p_i$, and the corresponding length and width of the region are $h_i$ and $\omega_i$. The significant sum of all the pixel points in the region is $S(p_i)$, the evaluation function is $H$, and the weight coefficients of the three functions are $\alpha$, $\beta$, and $\gamma$, respectively. The feature expression of the original image extracted by the first four layers of convolution is shown in Eq. (5).

(5)
$ F^{(i)} = Ext(I^{(i)}, P^{(i)}). $

In Eq. (5), the feature extraction function is $Ext$, the original image input is $I^{(i)}$, and the corresponding deep convolutional features of the region are $P^{(i)}$. The transformation relationship between a given set of feature vectors and the feature time of the region of interest is shown in Eq. (6).

(6)
$ Z = \sigma(\psi(W \times F)) \otimes F. $

Fig. 3. Multi-scale Information subnetwork structure.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig3.png

In Eq. (6), the given set of feature vectors is $F$, the feature vector weights are $W$, the region of interest features are $Z$, the softmax function is $\sigma$, the activation function is $\psi$, and the multiplication operation is $\otimes$. The structure of the multi-scale information sub-network is shown in Fig. 3.

In the multiscale information subnetwork structure, different scale features are adjusted by small convolutional layers, and the transverse connection output is shown in Eq. (7).

(7)
$ \begin{cases} f'_{b2} = \sigma(W'_{b2} \times \psi(W_{b2} \times f_{b2})), \\ f'_{b3} = \sigma(W'_{b3} \times \psi(W_{b3} \times f_{b3})), \\ f'_{b4} = \sigma(W'_{b4} \times \psi(W_{b4} \times f_{b4})). \end{cases} $

In Eq. (7), the output features of the small convolutional block are $f_{b2}$, $f_{b3}$, and $f_{b4}$, whose feature weights are $W_{b2}$, $W_{b3}$, and $W_{b4}$, respectively. the laterally connected outputs are $f'_{b2}$, $f'_{b3}$, and $f'_{b4}$, whose feature weights are $W'_{b2}$, $W'_{b3}$, and $W'_{b4}$, respectively. the shallow features are generated as shown in Eq. (8).

(8)
$ f_{shallow} = \{f'_{b2}; f'_{b3}; f'_{b4}\}. $

In Eq. (8), the shallow feature is $f_{shallow}$. The deep feature output from the last layer of the network is shown in Eq. (9).

(9)
$ f_{deep} = f_{b5}. $

In Eq. (9), the output of the last layer of the network is $f_{b5}$. During the feature encoding process, the study uses two stacked $3 \times 3$ convolutional layers to extract the middle layer features of the image, and ensures the output feature size and the number of channels by adjusting the convolutional layer step size and adding $1 \times 1$ convolutional layers to realize the shallow feature noise reduction and the image down sampling. In the feature fusion stage, the three low-level features are merged into shallow features by global average pooling of feature layers, as well as dimensional compression and splicing, while the original ResNet model is used to output deep features. The overall loss of model training is shown in Eq. (10).

(10)
$ L_M = \lambda_s \times L_s + \lambda_d L_d. $

In Eq. (10), the overall loss is $L_M$, the shallow and deep feature losses are $L_s$ and $L_d$, respectively, and the weights of the two parts of the loss are $\lambda_s$ and $\lambda_d$, respectively.

3.2. Training Strategies Based on Data Aggregation

After the aesthetic evaluation model based on CNN is established, a training strategy based on data aggregation is proposed. Through this training strategy, the model performance is further optimized and the aesthetic evaluation accuracy is improved. In order to better mine the sparsely distributed samples in the dataset, the study utilizes feature similarity as the classification basis and proposes a DA-based dataset partitioning method. Subsequently, a training method that combines sparse data that are correctly dispersed with compact samples is devised to enhance the model's generalization ability and provide the model an optimization direction. The DA-based training strategy is shown in Fig. 4.

Fig. 4. Training strategy based on data aggregation.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig4.png

In the DA-based training strategy, a CNN model is first trained using advertisement image aesthetic data to generate high-level deep features. This approach captures the semantic and abstract information in the images and is therefore more suitable for assessing the semantic similarity of the images. Next, a density clustering-based segmentation method is used to classify the dataset for semantic similarity. This method calculates the local densities and distances of the samples in a high-dimensional space in order to achieve an aggregated representation of the dataset and thus a better understanding of the semantic associations between the samples. On this basis, the study further adopts the Compact-to-Sparse training strategy to divide the learning process into a start-up phase and a retraining phase. In the start-up phase, an initialization model is first trained using the entire dataset, and then the model is used to extract features and divide the dataset into three subsets. Subsequently, learning starts from the compact subset to obtain the startup model. Upon entering the retraining phase, the features are reextracted and the three subsets are re-divided. In this phase, sparse subsets are added for learning to achieve further improvement of the model performance. Such a training approach makes it possible to use DA to enhance the model's functionality and increase its accuracy and dependability when handling jobs pertaining to advertising aesthetics. The Euclidean distance between features in the dataset division is calculated as shown in Eq. (11).

(11)
$ D_{ij} = \| f(P_i) - f(P_j) \|. $

In Eq. (11), the Euclidean distance between features $f(P_i)$ and $f(P_j)$ is $D_{ij}$. the local density of each map is calculated as shown in Eq. (12).

(12)
$ \rho_i = \sum_{j} X(D_{ij} - d_c). $

In Eq. (12), the local density of the image is $\rho_i$, the number of points in the image is $j$, and the constant associated with the sequence of alignments is $d_c$. the density taking function is $X(d)$, as shown in Eq. (13).

(13)
$ X(d) = \begin{cases} 1, & d < 1, \\ 0, & \text{others}. \end{cases} $

In Eq. (13), the density function takes the value of 1 when the unit distance $d$ is less than 1, and in other cases it takes the value of 0. The distance $\theta_i$ for each image is defined as shown in Eq. (14).

Fig. 5. Compact to sparse training strategy.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig5.png
(14)
$ \theta_i = \begin{cases} \min(D_{ij}), & \text{if } \exists j, \text{ s.t. } \rho_j > \rho_i, \\ \max(D_{ij}), & \text{others}. \end{cases} $

In Eq. (14), the clustering center is selected based on the distance between images and local density, and the principle of selection is to find the image that has the highest local density and the largest distance from other images. These photos make good clustering centers because they show points with high local density encircled by points with low local density. At the same time, points with large distances and high local densities can be considered as anomalies. The Compact-to-Sparse training strategy is shown in Fig. 5.

The goal of the training process for aesthetic assessment of advertising images is to discover potential image features and aesthetic rules. Due to the huge parameter search space of the CNN model, the initial learning direction is crucial for the model to converge to a better local optimum. The study designed a Compact-to-Sparse training strategy to learn the segmented dataset in stages. The aggregation of samples in the dataset is partitioned using a density clustering-based algorithm before training begins. A CNN model is first trained on the full training data, then high-level features are extracted and the dataset is divided. In the startup phase, a compactly distributed subset from the delineated dataset is taken and a new model is trained to learn the regular aesthetic features and rules. After the model has converged, the degree of aggregation of the samples in the training set is re-evaluated and delineated using the new model and fine-tuned based on it. The study also added sparsely distributed images to the dataset but gave them less weight to allow the model to learn more unique and complex aesthetic rules. Compared to standard neural network training, the Compact-to-Sparse training strategy learns relatively simple but effective decision boundaries. The overall loss function during data training is shown in Eq. (15).

(15)
$ L = \omega_0 L_0 + \omega_1 L_1 + \omega_2 L_2. $

In Eq. (15), the overall loss is $L$, and the losses for each stage of data are $L_0$, $L_1$, and $L_2$, whose corresponding weights are $\omega_0$, $\omega_1$, and $\omega_2$, respectively.

4. Results and Discussion

Aiming at the research institute to construct the model, a series of experiments are designed to verify its performance and the rationality of the use of technology. At the same time, the model is applied to practical applications to judge its application effect.

4.1. Effects of Aesthetic Feature Extraction

In order to establish the connection between different features, the study extracts feature from low convolutional layers and coordinates the different scale features. In order to test the extraction effect of the feature extraction method designed by the study, the experiment compares the feature extraction effect of CNN-based multi-scale feature extraction with that of scale invariant feature transform (SIFT), speeded-up robust features (SURF), and histogram of orientation gradients (HOG). Firstly, using different data volume images, feature extraction methods are used to extract features and compare the differentiation, computational efficiency and relevance of the features extracted by the four algorithms. The specific results are shown in Fig. 6.

Fig. 6. Comparison of the feature extraction effects of the four feature extraction methods.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig6.png

As can be seen in Fig. 6(a), the discriminability of the features extracted by CNN is 0.68, while that of SIFT is 0.44, that of SURF is 0.40, and that of HOG is 0.50. The discriminability of the features extracted by the four algorithms is in a relatively good interval. Among them, the discriminability of CNN is significantly higher than that of the other three algorithms. Fig. 6(b) shows that the correlation coefficient of CNN is 0.004, and the variables are linearly independent of each other. Compared to the other three algorithms, CNN extracts better features. This is due to its parallel computing nature and the optimization of the deep learning framework. Although CNN requires more computing resources in feature extraction, its efficient parallel processing capability keeps the overall computing efficiency at a high level. Fig 6(c) compares the correlation of features extracted by the four algorithms. The correlation of CNN is 0.82, SIFT is 0.70, SURF is 0.68, and HOG is 0.75. The feature correlation of CNN is high, which indicates that the features extracted by CNN have good consistency among different images, which is helpful for the subsequent image classification and recognition tasks.

Utilizing the features derived by the four algorithms, the experiment is trained to further assess the efficacy of the feature extraction techniques developed in the study. The training results are shown in Fig. 7.

Fig. 7. The training situation of the resulting feature training model extracted by different algorithms.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig7.png

In Fig. 7, the convergence of the training model using the features extracted by CNN is better. In Fig. 7(a), the CNN trained model is stabilized by training up to 12 times to get the target accuracy. While the models trained under the other three features all need to be trained more than 16 times. In Fig. 7(b), the recall of the CNN training model reaches convergence in 14 iterations, which is better compared to the other three algorithms. In Fig. 7(c), the loss value of the CNN training model starts to converge at 10 iterations, which is more than 5 iterations less compared to the other three algorithms. This demonstrates how CNN-extracted features can enhance the model's training convergence.

4.2. Comparative Analysis of Model Evaluation Results

Most of the current image aesthetic quality assessment divides images into two tiers, high and low, and there are fewer methods for image aesthetic quality laddering metrics. In order to verify the advantages of the model constructed by the research (Model 1), the AVA dataset was chosen for the experiment. The aesthetic assessment was performed on four categories, namely, natural scenery, urban architecture, portraits, and animals. And the results obtained from the scores were compared and analyzed with the AVA dataset scores. The results are shown in Fig. 8.

Fig. 8. model score fit to the AVA dataset.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig8.png

According to Figs. 8(a)-8(d), it can be seen that the evaluation results of aesthetic evaluation on the four categories of natural scenery, urban architecture, portrait, and animal images are very close to the rating results of the AVA dataset, with an average fitting degree of 0.907. The fitting degree for urban architecture is the highest, reaching 0.921, while the fitting degree for animal categories is 0.893. This is because images of urban architecture and portrait categories often have obvious aesthetic features and regularities, making it easier for models to capture these features and make accurate evaluations. In contrast, images of natural landscapes and animal categories are relatively difficult to evaluate due to their complex and varied natural elements and dynamic features, resulting in slightly lower fitting accuracy. To further examine the model performance, the study utilized it to compare with the IAA model based on the lightweight network model (Model 2) and the IAA model based on the context-aware attention mechanism (Model 3) on the AADB and AVAI databases. The comparison results are shown in Fig. 9.

Fig. 9. Performance comparison results of the three models.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig9.png

In Fig. 9(a), the classification accuracy of Model 1 reaches 85.79%, while that of Model 2 is 80.43%. The classification accuracy of the other two models is below 80%. This is due to the advantages of Model 1 in feature extraction and model structure design. Model 1 adopts the residual network structure in deep learning, which enables the network to learn the feature representation of images more deeply while alleviating the problem of gradient vanishing. In Fig. 9(b), the mean value of scoring error of Model 1 is 0.521, which is 0.485 lower compared to Model 2, its mean value of error is 0.600 lower compared to Model 3, and its mean value of error is 0.584 lower compared to Model 4. It can be seen that Model 1 has a high accuracy in IAA.

4.3. Data Aggregation-based Assessment Reasonableness Analysis

The study divides the aggregation of samples in the dataset based on the density clustering algorithm and adds sparsely distributed images to the dataset, giving smaller weights to allow the model to learn more and complex aesthetic rules. To test the reasonableness of using DA in the model, the experiment divides the dataset into compact (C), sparse (S) and highly sparse (HS) according to a certain ratio. Different training data were used for training. Pearson's sorting relevance (SPRC) is quoted, and classification accuracy is used as the evaluation index, and the final aesthetic evaluation results are shown in Table 1.

Table 1. Comparison of the final aesthetic evaluation results for the four training data.

Index

Training data

ALL

C

C+S

C+S+HS

Loss weight

(1,1,1)

(1,0)

(1,0.5)

(1,0.5,0.5)

Two-classification accuracy (%)

79.08

79.14

80.74

79.23

Multi-classification accuracy rate (%)

60.00

61.52

66.78

64.98

SPRC

0.625

0.628

0.704

0.672

According to Table 1, when training with a compact dataset, the binary classification accuracy of the model is 79.14%, the multi classification accuracy is 61.52%, and the SPRC is 0.628. After adding sparse datasets, the binary classification accuracy increased to 80.74%, the multi classification accuracy increased to 66.78%, and the SPRC further improved to 0.704. This indicates that adding sparsely distributed images to the dataset and giving them smaller weights can help the model learn more and more complex aesthetic rules, thereby improving the accuracy of evaluation. In addition, when further incorporating highly sparse datasets, the classification accuracy and SPRC metrics slightly decrease. This may be because the highly sparse dataset introduces too much noise, which has a negative impact on the training of the model. In summary, researching evaluation methods based on data aggregation can effectively improve the rationality of image aesthetic evaluation. Sparse and compact datasets are used to compare the impact of various weights on the model in order to better investigate the performance of this dataset. by tracking the model's classification accuracy and adjusting the weights in the loss function. Fig. 10 presents the findings.

Fig. 10. Classification of models under different loss function weights.

../../Resources/ieie/IEIESPC.2026.15.2.163/fig10.png

In Figs. 10(a) and 10(b), when the weight of the loss function is [1, 0.5], the classification accuracy of the model reaches 80.4%. This indicates that appropriately reducing the weight of sparse datasets when evaluating them can help improve the classification accuracy of the model. However, when the weight of the loss function is further adjusted to [1, 0.3] or [1, 0.7], the classification accuracy decreases. This indicates that when evaluating highly sparse datasets, excessively reducing or increasing their weights may result in the model being unable to fully learn the aesthetic rules in these datasets, thereby affecting the accuracy of the evaluation.

4.4. Effects of Model Application

The study utilized the model in actual AD, and 20 AD practitioners were selected to use the model for one month in their regular design. After using the model for one month, the practitioners' experience was analyzed through a questionnaire. The results are shown in Table 2.

Table 2. Results of the questionnaire from 20 AD design practitioners.

Project

The proportion of this option in the questionnaire (%)

Very consistent with

More in line with

Commonly

Very non-compliant

The model meets the use requirements

80.41

10.22

6.82

2.55

The model enables an accurate assessment

76.85

10.47

7.11

5.57

The model has a good use experience

80.21

10.84

6.95

2.00

The model has good classification performance

79.48

10.84

7.08

2.60

Table 2 shows that after one month of use, more than 80% of the trialists chose satisfied. In the survey on the assessment accuracy of the model, 75% of the trialists chose the option of very accurate. This shows that the institute constructed model can help practitioners to design more professionally in practical applications.

To further enrich the depth and breadth of the research, qualitative samples and results are added to evaluate the applicability and effectiveness of the research model. The study first collected qualitative samples from different advertising design fields, including graphic design, video advertising, online advertising, etc. The samples cover a variety of styles and topics to ensure that the models can handle diverse aesthetic needs. In collaboration with advertising design practitioners, the research team analyzed and labeled these samples in detail to ensure that they were representative of design needs in real work. After collecting enough qualitative samples, the research team fed these into the built model for testing. The model's assessment of these samples was compared with the professional judgment of advertising design practitioners. The comparison results are shown in Table 3.

Table 3. Qualitative results of aesthetic evaluation methods for advertising design images in research and design.

Project

The model's evaluation results (%)

The practitioners' professional judgment (%)

Very consistent with

78.5

80.2

More in line with

12.3

10.5

Commonly

6.2

6.8

Very non-compliant

3.0

2.5

As shown in Table 3, the evaluation results of the model are highly consistent with the professional judgments of advertising design practitioners. Specifically, 78.5% of model evaluations were considered "very consistent" with practitioners' judgments, while 12.3% of model evaluations were considered "fairly consistent with industry standards," 6.2% of model evaluations were considered "general," and only 3.0% were considered "very inconsistent." These results show that the model can accurately evaluate the aesthetic quality of advertising design images and meet the professional standards of the industry. The results show that the model can accurately identify the aesthetic elements in the design in most cases, and give reasonable evaluation results. This further verifies the validity and applicability of the model in practical applications.

5. Conclusion

As the size of the dataset on aesthetics increases, graphic design images exhibit significant differences in response and diversity across dimensions, causing the corresponding features that require extraction to become increasingly complex. To carry out a precise and scientific assessment of aesthetic value in graphic AD images, the study constructed a CNN and DA-based IAA model for graphic AD. The CNN was utilized to construct both the region of interest sub-network and multi-scale information sub-network. The model was trained using DA and by incorporating coefficient samples. The experimental analysis revealed that CNN-extracted features possessed a discriminative degree of 0.68, a correlation coefficient of 0.004, and an average computation time of 0.047 s. The aesthetic assessment yielded a goodness-of-fit of 0.907 with scores from the AVA dataset. Model 1 demonstrated a classification accuracy of 85.79%, with a mean scoring error of 0.521, both considerably superior to the other two models. Using compact data in combination with sparse data during training can generate more effective results. After one month of testing, 80% of users expressed satisfaction with the institute-built model's experiential feel. These results suggest that the research model can efficiently evaluate the aesthetic qualities of planar AD images with good generalization abilities and stability. In future research, additional optimization of the model's structure and parameters could enhance its classification accuracy and fit. Moreover, exploring the model's application in other domains, including IAA in art, literature, and history, would be valuable.

References

1 
Zhang L. , Zhang P. , 2021, Research on aesthetic models based on neural architecture search, Journal of Intelligent & Fuzzy Systems, Vol. 41, No. 2, pp. 2953-2967DOI
2 
Khare S. K. , Bajaj V. , 2020, Time–frequency representation and convolutional neural network-based emotion recognition, IEEE Transactions on Neural Networks and Learning Systems, Vol. 32, No. 7, pp. 2901-2909DOI
3 
Tripathi M. , 2021, Analysis of convolutional neural network based image classification techniques, Journal of Innovative Image Processing (JIIP), Vol. 3, No. 2, pp. 100-117DOI
4 
Sharma T. , Nair R. , Gomathi S. , 2022, Breast cancer image classification using transfer learning and convolutional neural network, International Journal of Modern Research, Vol. 2, No. 1, pp. 8-16Google Search
5 
Gururaj N. , Vinod V. , Vijayakumar K. , 2023, Deep grading of mangoes using convolutional neural network and computer vision, Multimedia Tools and Applications, Vol. 82, No. 25, pp. 39525-39550DOI
6 
Okarma K. , Fastowicz J. , 2020, Improved quality assessment of colour surfaces for additive manufacturing based on image entropy, Pattern Analysis and Applications, Vol. 23, No. 3, pp. 1035-1047DOI
7 
Kandel I. , Castelli M. , 2020, The effect of batch size on the generalizability of the convolutional neural networks on a histopathology dataset, ICT Express, Vol. 6, No. 4, pp. 312-315DOI
8 
Li W. , 2022, Aesthetic assessment of packaging design based on con-transformer, International Journal of e-Collaboration (IJeC), Vol. 19, No. 5, pp. 1-11DOI
9 
Zhu H. , Li L. , Wu J. , Zhao S. , Ding G. , Shi G. , 2020, Personalized image aesthetics assessment via meta-learning with bilevel gradient optimization, IEEE Transactions on Cybernetics, Vol. 52, No. 3, pp. 1798-1811DOI
10 
Yan W. , Li Y. , Yang H. , Huang B. , Pan Z. , 2022, Semantic-aware multi-task learning for image aesthetic quality assessment, Connection Science, Vol. 34, No. 1, pp. 2689-2713DOI
11 
Xiang X. , Cheng Y. , Chen J. , Lin Q. , Allebach J. , 2020, Semi-supervised multi-task network for image aesthetic assessment, Electronic Imaging, Vol. 2020, No. 8, pp. 188-188DOI
12 
Niu Y. , Chen S. , Song B. , Chen Z. , Liu W. , 2022, Comment-guided semantics-aware image aesthetics assessment, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 33, No. 3, pp. 1487-1492DOI
13 
Wang W. , Wen Z. , Chen J. , Gu Y. , Peng Q. , 2024, Evaluation method for virtual museum interface integrating layout aesthetics and visual cognitive characteristics based on improved gray H-convex correlation model, Applied Sciences, Vol. 14, No. 16, pp. 7006-7010DOI
14 
Ilesanmi A. E. , Ilesanmi T. O. , 2021, Methods for image denoising using convolutional neural network: A review, Complex & Intelligent Systems, Vol. 7, No. 5, pp. 2179-2198DOI
15 
Cao X. , Yao J. , Xu Z. , Meng D. , 2020, Hyperspectral image classification with convolutional neural network and active learning, IEEE Transactions on Geoscience and Remote Sensing, Vol. 58, No. 7, pp. 4604-4616DOI
16 
Van S. P. , Le H. M. , Thanh D. V. , Dang T. D. , Loc H. H. , Anh D. T. , 2020, Deep learning convolutional neural network in rainfall–runoff modelling, Journal of Hydroinformatics, Vol. 22, No. 3, pp. 541-561DOI
17 
Irmak E. , 2021, Multi-classification of brain tumor MRI images using deep convolutional neural network with fully optimized framework, Iranian Journal of Science and Technology, Transactions of Electrical Engineering, Vol. 45, No. 3, pp. 1015-1036DOI
18 
Wieczorek M. , Siłka J. , Woźniak M. , Garg S. , Hassan M. M. , 2021, Lightweight convolutional neural network model for human face detection in risk situations, IEEE Transactions on Industrial Informatics, Vol. 18, No. 7, pp. 4820-4829DOI
19 
Sarvamangala D. R. , Kulkarni R. V. , 2022, Convolutional neural networks in medical image understanding: A survey, Evolutionary intelligence, Vol. 15, No. 1, pp. 1-22DOI
20 
Preethi P. , Mamatha H. R. , 2023, Region-based convolutional neural network for segmenting text in epigraphical images, Artificial Intelligence and Applications, Vol. 1, No. 2, pp. 119-127DOI
Lina Yu
../../Resources/ieie/IEIESPC.2026.15.2.163/au1.png

Lina Yu obtained her master of arts degree from Shandong Normal University in 2009. Currently, she serves as a lecturer and the director of the office at the New Generation Information Technology Industry College of Shandong Vocational College. She holds the title of Senior Graphic Designer and has frequently served as a judge for various competitions. She has led one provincial key research project titled "Research on the Nationality and Benevolence of Public Welfare Posters," participated in four others, and been involved in the Shandong Vocational Education Skills and Techniques Inheritance and Innovation Platform Construction Project. She has authored one digital textbook, "Advertising Poster Design," and two school-based textbooks, "Graphic Creativity" and "Logo Design." Under her guidance, students have won over 30 gold, silver, and bronze awards in various competitions. She has published six academic papers and led the application for one project under the Ministry of Education's Supply-Demand Alignment Program. She has also applied for one software copyright. Her main research areas include graphic design, portrait processing, advertising design, and VR virtual reality design.

Li Xu
../../Resources/ieie/IEIESPC.2026.15.2.163/au2.png

Li Xu obtained a master of engineering degree in computer technology from Shandong University in 2006. Currently, she serves as an associate professor and the leading figure of the Animation Production Technology major at the New Generation Information Technology Industry College of Shandong Vocational College. She holds professional certificates such as Senior Graphic Designer and Animation Game Designer, and has served as a judge for provincial and ministerial-level skills competitions on multiple occasions. She has won one first prize in the Shandong Vocational Education Teaching Achievement Award and one third prize in the Jinan Computer Science and Technology Award. She has led one project under the Ministry of Education's Supply-Demand Alignment Program and participated as a key member in eight provincial and ministerial-level teaching and research projects. She is the chief editor of the textbooks "Introduction to Operating Systems" and "C Language Programming Project Tutorial." She holds two utility model patents and one software copyright. Her main research areas include graphic design, 3D modeling technology, and virtual reality technology.