Mobile QR Code QR CODE

2024

Acceptance Ratio

21%

Main Menu

※ The user interface design of www.ieiespc.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

Journal Search

IEIESPC(IEIE Transactions on Smart Processing and Computing)

IEIESPC Vol. 11, No. 05, p.332-342

ISSN (online) :

2287-5255

Received : 27 July 2022Revised : 26 August 2022Accepted : 13 September 2022

DOI :

https://doi.org/10.5573/IEIESPC.2022.11.5.332

Regular Paper

A Painting Style System using an Improved CNN Algorithm

Zhong Yuan¹^* Huang Xinyan²

(Department of Architecture, Sichuan College of Architectural Technology, Chengdu, 615000, China)
(School of Computer Science and Technology, Shandong University of Finance and Economics, Ji’nan, 250014, China )

^* Corresponding Author: Yuan Zhong Yuanzhongsch@163.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

The rapid development of deep learning technology allows ordinary people to create artwork that imitates the style of paintings by famous masters through an algorithm. To create such works with artistic style, this research proposes an artificial neural network algorithm based on an improved convolutional neural network (CNN). First, a fast style-rendering model based on the improved CNN is constructed, and then, a server front end is built with the Bootstrap framework. The server-side back end of the system is built by combining a Python algorithm and a web framework, and finally, a complete model of the front-end and back-end network of the style rendering system is constructed. The model proposed in this paper is compared with two other models to verify its performance. The results show that information entropy of the model constructed is the highest at 5.58, which is higher than information entropy of the other two models. The average gradient value and the peak signal-to-noise ratio under the constructed model are 22.54 and 27.81, respectively, which are also higher than the other two models. Mutual information and the structural similarity index between rendered images and sample images under all three models were compared. Mutual information and structural similarity index of the model constructed by this research are 1.19 and 0.56, respectively, with much larger data sizes than the two comparison models.

Keywords

CNN, Style rendering, Artificial neural network, Artistic style

1. Introduction

With the advent of the Internet era, more and more people like to share pictures in different styles on social networks, and the development of computer network technology has provided convenient, successive application of various algorithms for image processing ^[1]. How to allow ordinary people to easily create pictures in the style of a famous master has become the focus of artificial intelligence in the field of artistic style. In the painting process, artistic style is often the most difficult to imitate ^[2]. Since the expression of an image in a computer is presented with countless pixels, these pixels carry the content and style information of the image and can be extracted by filters in a convolutional neural network (CNN). Therefore, through a series of convolution operations, an image in a desired artistic style can be obtained. In the field of computer images, the style of an image is often regarded as texture, so the premise of imitating a certain style is that the texture features of the original sample image can be accurately extracted. Research on texture extraction can be traced back to the 1980s. At the beginning of the 21st century, Ojala et al. proposed a classic local binary-pattern-based texture analysis method that simply and efficiently addressed the problem of texture feature description ^[3]. In 2013, Ramos et al. proposed a rotation-invariant image texture extraction method using multi-resolution analysis and eigenvalues of intensity gradients. In 2015, Gatys et al. first proposed application of a neural network algorithm for image texture extraction and provided a specific style transfer model ^[4]. Although the style transfer model proposed by Gatys et al. can implement style transfer well, its calculation process is relatively large and complex, and the cost is high. If updating the painting style of a slightly larger image, the model needs to run for several hours, and the long wait is not suitable for the needs of ordinary people ^[5]. The traditional CNN characteristics automatically extract images, which can imitate brain tissue to automatically learn and train the data and realize tasks such as image classification. The traditional style transfer network mainly uses the VGG-19 model to extract texture features and the content of a picture. But the style transfer algorithm needs to deal with blank noise separately every time, and the calculation is complicated. Therefore, this paper improves the traditional style rendering technology, integrates a CNN algorithm and a VGG-19 network, and uses the TensorFlow function to realize a convolution operation that builds a fast style-rendering model based on the improved CNN.

This research first builds a fast style-rendering model based on the improved CNN, and further combines a Python algorithm and a web application framework to build the system server back end. It allows users to access the style rendering system through web pages, upload their own images, and complete them according to the selected style requirements for real-time image rendering. To verify the performance of the fast style-rendering model proposed in this study, we selected a style rendering method based on a BP neural network and another traditional algorithm for comparison. Since a subjective evaluation is likely to be biased because of subjective emotions and personal experience, this study used a series of mathematical indicators such as information entropy (IE), average gradient (AG), and spatial frequency (SF) for objective analysis.

2. Related Works

With the proposal of deep learning theory, the CNN has been rapidly developed and widely used in various fields. Lee et al. proposed a hybrid machine feature method with better robustness, which classifies targets in marine and ground environments by using CNN integration methods, and classifies clutter in infrared images in various environments. The results showed that the method takes advantage of each environment’s training model to achieve higher success and accuracy in the classification of targets and clutter ^[6]. Khatami et al. focused on forming a search space for narrowing semantic differences in medical image analysis, with transfer learning via the CNN used in the first stage of search space narrowing. Before using the Radon transform to create a selection pool to further the narrowing, the scheme was validated with the IRMA dataset, and the results showed accuracy of up to 90.30% ^[7]. Phan et al. proposed a joint CNN-based classification and prediction framework for correct identification of sleep stages and treatment of sleep abnormalities. The framework exploits the dependency between successive sleep periods to overcome common problems encountered in classification schemes, while also having the ability to generate multiple decisions. Finally, experiments were conducted on two public datasets, and results showed the framework had overall classification accuracies of 82% and 83% ^[8]. Algermissen et al. applied a CNN to a model for human recognition from footstep sounds, and the results of the study showed that the accuracy of the CNN-algorithm-based sound recognition model was 0.98 ^[9]. Wang et al. used a DCNN to build a model to help clinicians and radiologists diagnose 2019 positive cases of coronavirus disease from chest X-ray images, with the average accuracy of the model at 99.06% ^[10]. Its performance was improved by 1.87% compared with other traditional models.

Wang and colleagues proposed a successful sketch-based image retrieval system that was able to infer the user's drawing style by analyzing outline features of sketches ^[11]. It then identified sketches with similar drawing styles from a database, collected them in a unified manner, and finally matched the best image by comparing sketches stored in historical records. Compared with other recognition methods, the system improved recognition accuracy by 15%, compared with Fu et al., who introduced a new loss function and multi-scale structural similarity (SSIM) to force structure retention in order to solve the problem of artifacts and blurring in image generation of existing methods ^[12]. Experiments showed that the Fu et al. method could effectively reduce the problem of blurring artifacts in image generation, and could generate images with clearer quality and more. Wu et al. found that the direction of each paint stroke could better capture the soul of an image style to produce a more natural style based on the observation that a direction-aware neural style transformation with texture enhancement was proposed ^[13]. The Wu et al. method was shown to be superior to the latest style transformation methods. Sun et al. developed a co-creation drawing system based on a generative adversarial network in order to address the inability of existing AI methods to convert various user sketches into artistically beautiful paintings while retaining semantic concepts. This enabled machine and human collaboration in cartoon landscape painting, and they experimentally confirmed that the proposed SmartPaint system successfully generated high-quality cartoon paintings ^[14]. Chen designed and improved an ink-painting algorithm based on a deep learning framework and a convolutional neural network model in order to apply information technology to the creation of ink paintings. Express was used as a web-side framework to complete the front-end page effect. The results showed that the ink-painting algorithm is feasible and can achieve the desired purpose ^[15].

At present, the CNN has been widely used in image recognition in various fields, but there are few studies on its application to artistic style. How to allow ordinary people to create pictures in the style of famous masters through simple operations is the focus of current intelligent art research. Therefore, in view of the above problems, this paper proposes a style rendering system based on an improved CNN algorithm. First, a fast style-rendering model based on the improved CNN was established, and then, the front-end and back-end networks of the style rendering system were constructed based on the model.

3. Methodological Design

3.1 Model Construction

Because a CNN has characteristics of automatically extracting images, it can automatically learn and train data by imitating brain tissue, and can realize tasks such as image classification. The CNN has been widely used in image processing. The basic components of the CNN are the data input layer, the convolution calculation layer, the excitation layer, the pooling layer, the fully connected layer, and the output layer ^[16]. The convolutional layer is the core of a CNN. The function of the convolutional layer is to extract the features of the image and strengthen the expressive ability of the network after a convolution operation ^[17]. The calculation of the convolutional layer is divided into two steps; the first is to capture image position information, and then, feature extraction is performed on the captured image. After the first convolution operation is the calculation formula of the image change as seen in (1):

(1)

$ \begin{equation} \begin{array}{l} H_{out}=\frac{H_{in}-Filter+Padding}{Stride}+1\\ W_{out}=\frac{W_{in}-Filter+Padding}{Stride}+1 \end{array} \end{equation} $

In (1), $H_{out}$ is the height after convolution, $W_{out}$ is the width after convolution, and $Stride$ is the step size. Rapid development of the CNN has enabled ordinary people to imitate the style of a famous artist’s paintings to create works of art in that style, which is called style migration. The traditional style migration network mainly uses the VGG-19 model to extract texture features and the content of the image. The grid defines the content LOSS function and the style LOSS function, and the final LOSS function is weighted by those two LOSS functions. The final LOSS function is obtained by weighting the above two LOSS functions. The LOSS function is minimized through continuous iterative training to obtain the final image after style rendering. A common style migration model is shown in Fig. 1.

As seen in Fig. 1, common style migration algorithms require separate processing of blank noise each time, and are too computationally complex. Therefore, this study improves on the traditional style rendering technique by fusing a CNN algorithm and a VGG-19 network while using a TensorFlow function to implement a convolution operation to build a fast style-rendering model. The fast style-rendering model based on the improved CNN is shown in Fig. 2.

Fig. 1. Traditional style-migration model.

Fig. 2. Fast style-rendering model with the improved CNN algorithm.

As can be seen in Fig. 2, the fast style-rendering model is divided into two main parts: the generative model and the loss model. In the generative model, the original image is input, and after a series of operations, the final output is a similarly styled image. The generative model is essentially a convolutional neural network structure consisting of a convolutional layer, a residual layer, and a deconvolution layer. The loss model is essentially a pre-trained VGG-19 network structure that does not require weight updates during training, but is only used to calculate the loss values for content and style, and then to update the weights of the previous generation of the generative model through back-propagation. The fast style-rendering model is trained by selecting a style image, Ys, and a content image, Yc, during the training phase, then training the different style and content images into different network models. In order to calculate the difference between resulting image Y and the sample image, the LOSS model is used to extract the information of these images in different convolutional layers and compare them. Then, the weights are changed by back-propagation so that the resulting image Y is close to Ys in terms of style, and close to Yc in terms of content. The weights are then recorded to obtain a fast style-rendering model for that style. The LOSS function for the fast style-rendering model is defined as follows:

(2)

$ \begin{equation} \int \begin{array}{l} \varphi ,i\\ feat \end{array}\left(\hat{M},M\right)=\frac{1}{C_{i}H_{i}W_{i}}\left\| \phi _{i}\left(\hat{M}\right)-\phi _{i}\left(M\right)\right\| \begin{array}{l} 2\\ 2 \end{array} \end{equation} $

In Eq. (2), $\phi $ is the trained VGG-19 model, $i$ is the number of convolutional layers, $\phi _{i}\left(M\right)$ represents the activation value of the image at layer $M$$i$ of the $\phi $ model, $\hat{M}$ represents the generated image after the model update, and $M$ is the starting input image. $C_{i}$ in $C_{i}H_{i}W_{i}$ represents the number of channels of the feature image at layer $i$, $H_{i}$ represents the height of the feature image at layer $i$, and $W_{i}$ represents the width of the feature image at layer $i$. In addition, the Gram matrix is also used, and is given in (3):

(3)

$ \begin{equation} \int \begin{array}{l} \varphi ,i\\ style \end{array}\left(\hat{M},M\right)=\left\| G\begin{array}{l} \varphi \\ i \end{array}\left(\hat{M}\right)-G\begin{array}{l} \varphi \\ i \end{array}\left(M\right)\right\| \begin{array}{l} 2\\ F \end{array} \end{equation} $

In Eq. (3), $F$ represents matrix parameterization, and $G\begin{array}{l} \varphi \\ i \end{array}\left(M\right)$ is the matrix of the activation values of the picture at level $M$$i$ in the $\phi $ model, which is defined in (4):

(4)

$ \begin{equation} G\begin{array}{l} \varphi \\ i \end{array}\left(M\right)_{c,c'}=\frac{1}{C_{i}H_{i}W_{i}}\sum _{h=1}^{H_{i}}\sum _{w=1}^{W_{i}}\phi _{i}\left(M\right)_{h,w,c}\phi _{i}\left(M\right)_{h,w,c'} \end{equation} $

In Eq. (4), $G\begin{array}{l} \varphi \\ i \end{array}\left(M\right)_{c,c'}$ is the matrix correlation between the two channels in image $c,c'$$M$, and $\phi _{i}\left(M\right)_{h,w,c}$ is the height and width of the activation values of image $M$ at layer $i$ in the $\phi $ model and the channel coordinate values. The total LOSS of the fast style-rendering model is defined in Eq. (5):

(5)

$ \begin{equation} LOSS_{all}=LOSS_{content}+LOSS_{style} \end{equation} $

In Eq. (5), the total loss value of the model is obtained by weighting the style and content loss values. In order to avoid subjective evaluation with too strong a sense of personal preference and emotion, leading to inaccurate evaluation results, the evaluation criteria of the final rendered images are mainly based on objective evaluation methods, and the data from indicators such as information entropy, mean squared error (MSE), peak signal-to-noise ratio (PSNR), and average gradient are used to make a comprehensive evaluation of the model.

(6)

$ \begin{equation} IE=-\sum _{j=0}^{L-1}p_{j}\log _{2}p_{j} \end{equation} $

Eq. (6) is an expression for information entropy, where $j$ is the grey value, $p_{j}$ represents the proportion of pixels with a grey value of $j$ in the image, and $L$ is the total grey value. The higher the IE, the higher the quality of the rendered image.

(7)

$ \begin{equation} MSE=\frac{1}{M\times N}\sum _{i=0}^{M-1}\sum _{j=0}^{N-1}\left(M\left(i,j\right)-\hat{M}\left(i,j\right)\right)^{2} \end{equation} $

Eq. (7) is an expression for MSE. $M\times N$ indicates the size of the image, and a smaller value indicates a higher-quality rendered image.

(8)

$ \begin{equation} PSNR=10\times \lg \frac{\left(2^{k}-1\right)^{2}}{MSE} \end{equation} $

Eq. (8) is an expression for PSNR in which $k$ is a binary number and the default is 8. A higher PSNR means less distortion and a better visual appearance of the image.

(9)

$ \begin{equation} G=\frac{1}{\sqrt{2}\left(M-1\right)\left(N-1\right)}\sum _{i=0}^{M-1}\sum _{j=0}^{N-1}\sqrt{\left(\frac{\partial f}{\partial X}\right)^{2}+\left(\frac{\partial f}{\partial Y}\right)^{2}} \end{equation} $

Eq. (9) is the expression for average gradient in which $\frac{\partial f}{\partial X}$ and $\frac{\partial f}{\partial Y}$ represent the horizontal and vertical gradients, respectively. The higher the G value, the clearer the image.

(10)

$ \begin{equation} R=\frac{\sum _{i=0}^{M}\sum _{j=0}^{N}\hat{M}\left(i,j\right)M\left(i,j\right)}{\sqrt{\sum _{i=0}^{M}\sum _{j=0}^{N}\hat{M}\left(i,j\right)^{2}}\sqrt{\sum _{i=0}^{M}\sum _{j=0}^{N}M\left(i,j\right)^{2}}} \end{equation} $

Eq. (10) is an expression for the correlation coefficient (R), where a higher R value indicates higher correlation between the rendered image and the sample.

(11)

$ \begin{equation} MI\left(\hat{M},M\right)=\sum _{a,b}P_{\hat{M}M}\left(a,b\right)\log _{2}\frac{P_{\hat{M}M}\left(a,b\right)}{P_{\hat{M}}\left(a\right)P_{M}\left(b\right)} \end{equation} $

Eq. (11) is an expression for mutual information (MI) of the rendered image and sample image $P_{\hat{M}M}\left(a,b\right)$. A larger value for MI indicates higher correlation between the images.

(12)

$ \begin{equation} SF=\sqrt{HF^{2}+VF^{2}} \end{equation} $

Eq. (12) is an expression for spatial frequency; a higher SF indicates a more spatially active image, i.e. a clearer image.

(13)

$ \begin{equation} HF=\sqrt{\frac{1}{MN}\sum _{i=1}^{M}\sum _{j=2}^{N}\left(f\left(i,j\right)-f\left(i,j-1\right)\right)^{2}} \end{equation} $

Eq. (13) is an expression for the horizontal direction frequency in the spatial frequency.

(14)

$ \begin{equation} VF=\sqrt{\frac{1}{MN}\sum _{i=2}^{M}\sum _{j=1}^{N}\left(f\left(i,j\right)-f\left(i-1,j\right)\right)^{2}} \end{equation} $

Eq. (14) is an expression for the frequency in the vertical direction of the spatial frequency.

(15)

$ \begin{equation} SSIM=\frac{\left(2\mu _{A}\mu _{B}+k_{1}\right)\left(2\sigma _{AB}+k_{2}\right)}{\left(\mu \begin{array}{l} 2\\ A \end{array}+\mu \begin{array}{l} 2\\ B \end{array}+k_{1}\right)\left(\sigma \begin{array}{l} 2\\ A \end{array}+\sigma \begin{array}{l} 2\\ B \end{array}+k_{2}\right)} \end{equation} $

Eq. (15) is an expression for structural similarity index $\mu _{A},\mu _{B}$, which is the mean value of the rendered image and the sample image, respectively; ${\sigma ^{2}}_{A},{\sigma ^{2}}_{B}$ denotes variance in the rendered image and the sample image, respectively, while $\sigma _{AB}$ is the covariance, with $k_{1},k_{2}$ as constants. A higher SSIM indicates a higher degree of similarity between the two images.

3.2 Front- and Back-end Network Construction

Based on the fast style-rendering model constructed in the previous section, this section combines Python algorithms and the Python Web framework to build the server-side back end of the system, allowing users to access the style rendering system via the web, upload their own images, and complete real-time rendering of the images in the selected style ^[18]. The system's server front end uses the Bootstrap developmental framework to improve adaptability to different browsers. The system server back end is divided into three main parts. The first part is the Uniform Resource Locator (URL) module, which receives URL requests from the front end and feeds them into the target function for execution. The second part is the logic processing module, which mainly performs image processing, including functions such as transcoding and biasing ^[19]. The third part is the fast style-rendering algorithm module, which completes the style conversion of the image so that the input image can be rendered into the target style according to instructions and can be presented to the user smoothly, as shown in Fig. 3.

Fig. 3 is a flow chart of the style rendering system. The URL module makes the whole system more stable, and all instructions from the front end need to be filtered by the URL module first. If you need to add new functions, you only need to write the function and forward it from the URL, which can greatly reduce runtime and the threshold for use of the algorithm. A flow chart of the entire rendering request is shown in Fig. 4.

As seen in Fig. 4, since the format of the image content transmitted in the network is relatively special, it is generally necessary to first encode and decode the image content sent by the client. If it can be successfully decoded, the logic processing module will pass the image to the fast style-rendering model to execute the rendering algorithm until completion, and will then present the rendered image to the user. Multiple computations during the course of operations are executed concurrently, and they potentially interact. In addition, there are quite a few operating paths that the system can take, and results may be uncertain. Therefore, after receiving the rendering request in the background, the system hands over the request to the process scheduling function, which arranges the rendering tasks according to the situation in the scheduling pool. The process scheduling pool is shown in Fig. 5.

We can see from Fig. 5 that the front-end page of the rendering system adopts the Bootstrap framework, which is not only simple and efficient but also has outstanding performance. Bootstrap lowers the threshold for user access from mobile devices by rewriting most of the HTML controls ^[20]. The main functions of the front end are as follows. The user uploads photos first, then selects the style they want rendered. The system performs rendering operations with the selected photos and styles, and displays the finished image. If the image is uploaded successfully, the back end will signal that the front end image has been successfully received, and it issues an instruction to render the image. After the front end receives the instruction, it will continuously ask the back end whether the rendering operation is complete. Upon completion, it retrieves the rendered image and displays it. The entire front-end page-rendering process is shown in Fig. 6.

As seen in Fig. 6, the entire front-end operation is divided into four steps. First, pictures are uploaded to request rendering; then, the process waits in the background. When results arrive, progress is queried based on the instructions. Finally, reference pictures are returned to the style picture library to complete the front-end rendering operation.

Fig. 3. Flow chart of the style rendering system model.

Fig. 4. Flowchart of the style rendering request module.

Fig. 5. Flow chart of the process pool operation.

Fig. 6. Flow chart of front-end rendering operations.

4. Validation of Fast Style Rendering

In order to verify the performance of the fast style-rendering model with the improved CNN algorithm (hereafter referred to as the model), two other commonly used neural network methods were selected for comparative analysis: the BP neural network style-rendering method (Model 1) and the unimproved convolutional neural network style-rendering method (Model 2). Six different types of images including landscapes, people, animals, and food were selected. Subjective evaluation allows for a more intuitive analysis of the strengths and weaknesses of the various methods, but for visual comparison, it is likely that subjective emotion and personal experience will be mixed in to make biased evaluations, making the results less convincing. Objective evaluation was mainly based on mathematical indicators, such as information entropy, mean squared error, and average gradient. Among the objective evaluation metrics, IE reflects the amount of information contained in the image, while AG, SF, and PSNR reflect the clarity of the image. MI, correlation coefficient R, and SSIM between the rendered image and the sample image can be used to measure the correlation between the two.

Fig. 7 shows IE values under the three models. According to the definition of IE, we can see that a larger IE value means the quality of the rendered image is higher. As seen in Fig. 7, among the six sets of images, IE values of the model are higher than the other two models, indicating that the model is better than the other two for rendering images. The IE of the first image rendered by the model was highest (5.58) compared to 3.84 from Model 1 and 2.31 from Model 2, indicating that the rendering quality of the first image was the best from among all six images, and the IE value of the fourth image from Model 2 was the lowest, indicating poor rendering quality.

AG and PSNR under each of the three models are shown in Fig. 8. Since both values show the clarity of the image, the data from the two were combined for analysis and evaluation. As seen in Fig. 8, among the six sets of images selected, AG and PSNR of the model are higher than the other two models, indicating that the images rendered by the model are clearer than those rendered by the other two. Combining AG and PSNR, the values of the first image from the model were both greater than those of the other images (22.54 and 27.81, respectively). The AG of the sixth image from Model 2 was the lowest at 2.85, and the PSNR of the third image from Model 2 was the lowest at 4.76, indicating that these two images were rendered with low sharpness.

MI and SSIM from each of the three models are shown in Fig. 9. Since both values reflect the degree of similarity between the rendered and sample images, the data from both were jointly analyzed and evaluated. As seen in Fig. 9, among the six sets of images, MI and SSIM of the model were higher than the other two, indicating that images rendered by the model are closer to the sample images than those rendered by the others. Image 2 had the largest MI and SSIM from the model (1.19 and 0.56, respectively). The sixth image had the smallest MI (0.18 from Model 2), and the third image had the smallest SSIM (0.07 from Model 2), indicating that these images were rendered less well.

Fig. 10 shows the R and SF values from each of the three models, reflecting the overall spatial activity of the images. The SF from the model was higher than the other two (53.35 for the first image, and 38.15 for the sixth image). The R value measures the degree of correlation between the rendered and sample images. The R value from the model was higher than the other two models for all six images (0.57 for the second image), indicating that the model rendered images closer to the samples.

To further observe the data of different indicators under the three models, AG, PSNR, and SF were combined as picture clarity indicators, and MI, SSIM, and R were combined as similarity degree indicators. With the IE value for comparative evaluation, the results are shown in Table 1.

As seen in Table 1, the IE value, the sharpness, and the similarity produced by the model were 5.954, 50.431, and 0.643 respectively, which were much higher than the other two models. In summary, the model can render images with better quality and higher definition, and the final rendered images showed the most similarity to the sample images.

Fig. 7. IE values for different pictures from the three models.

Fig. 8. Clarity of images based on AG and PSNR.

Fig. 9. The similarity between images based on MI and SSIM.

Fig. 10. R value and SF value from the three models.

Table 1. Comparison of evaluation metrics for images rendered by the three models.

Rendering models	Evaluation indicators
Rendering models	IE	Clarity	Degree of similarity
Model	5.954	50.431	0.643
Model 1	3.451	31.764	0.342
Model 2	1.354	10.349	0.124

5. Conclusion

Addressing the difficulty that ordinary people have in imitating artistic styles of famous masters, this research proposed a style rendering system based on an improved CNN algorithm. First, a fast style-rendering model based on the improved CNN was constructed, and then, the network structure was designed for the entire model. Finally, a complete front-end and back-end network for the style rendering system was constructed, allowing users to access it from a web page, upload their own images, and follow the chosen style requirements so that real-time rendering of the image can be done. The performance of the model proposed in this research was verified by comparing it with two other models. The results show that in the six selected groups of images, IE from the model reached 5.58, which was higher than Model 1 at 3.84 and Model 2 at 2.31, indicating that the image quality rendered by our model is better. The highest AG and PSNR from the model were 22.54 and 27.81, respectively, which are higher than the other two models, indicating that the image rendered by the model is of high definition. Finally, MI and SSIM were compared (1.19 and 0.56, respectively), indicating that the image style rendered by the model had the highest similarity to the sample image style. This research mainly verified the performance of the constructed model, but it lacks practical application, so stability and applicability of the model need to be studied further. Through the comparison of the above three sets of data, the fast style-rendering model based on the improved CNN algorithm has high definition and similarity when compared to the sample pictures, but further research on other parameters is required to fully replicate the painting styles of famous masters. At the same time, different types of painting have different rendering methods, so the model in this study cannot be applied to style rendering based on all paintings.

REFERENCES

Wang. Y, Sun. Y, Liu. Z, Sarma. SE, Bronstein. MM, Solomon. JM, 2019, Dynamic Graph CNN for Learning on Point Clouds, ACM Transactions on Graphics, Vol. 38, No. 5, pp. 146.1-146.12

Sun. X, Wu. P, Hoi. S, 2018, Face Detection using Deep Learning: an Improved Faster RCNN Approach, Neurocomputing, Vol. 299, No. 19, pp. 42-50

Frid-Adar. M, Diamant. I, Klang. E, Amitai. M, Goldberger. J, Greenspan. H, 2018, GAN-based Synthetic Medical Image Augmentation for increased CNN Performance in Liver Lesion Classification, Neurocomputing, Vol. 12, No. 10, pp. 321-331

Liu. R, Zhao. Y, Wei. S, Yang. Y, 2018, Indexing of CNN Features for Large Scale Image Search, Pattern Recognition, Vol. 48, No. 10, pp. 2983-2992

Ullah. A, Ahmad. J, Muhammad. K, Sajjad. M, Baik. SW, 2018, Action Recognition in Video Sequences using Deep Bi-Directional LSTM With CNN Features, IEEE Access, Vol. 6, No. 99, pp. 1155-1166

Lee. JY, Lim. JW, Koh. EJ, 2018, A study of image classification using hmc method applying CNN ensemble in the infrared image, Journal of Electrical Engineering and Technology, Vol. 13, No. 3, pp. 1377-1382

Khatami. A, Babaie. M, Tizhoosh. HR, Khosravi. A, Nguyen. T, Nahavandi. S, 2018, A Sequential Search-Space Shrinking Using CNN Transfer Learning and a Radon Projection Pool for Medical Image Retrieval, Expert Systems with Applications, Vol. 100, pp. 224-233

Phan. H, Andreotti. F, Cooray. N, Chén. OY, De. Vos. M, 2018, Joint Classification and Prediction CNN Framework for Automatic Sleep Stage Classification, IEEE Transactions on Biomedical Engineering, Vol. 5, No. 66, pp. 1285-1296

Algermissen. S, Hrnlein. M, 2021, Person Identification by Footstep Sound Using Convolutional Neural Networks, Applied Mechanics, Vol. 2, No. 2, pp. 257-273

Wang. X, Gong. C, Khishe. M, Mohammadi. M, Rashid. TA, 2021, Pulmonary Diffuse Airspace Opacities Diagnosis from Chest X-Ray Images Using Deep Convolutional Neural Networks Fine-Tuned by Whale Optimizer, Wireless Personal Communications, Vol. 124, No. 2, pp. 1355-1374

Wang. F, Lin. S, Luo. X, Zhao. B, Wang. R, 2019, Query-by-sketch image retrieval using homogeneous painting style characterization, Journal of Electronic Imaging, Vol. 28, No. 2, pp. 023037.1-023037.11

Fu. F, Lv. J, Tang. C, Li. M, 2020, Multi-style Chinese art painting generation of flowers, IET Image Processing, Vol. 3, No. 15, pp. 746-762

Wu. H, Sun. Z, Zhang. Y, Li. Q, 2019, Direction-aware Neural Style Transfer with Texture Enhancement, Neurocomputing, Vol. 370, No. 22, pp. 39-55

Sun. L, 2019, SmartPaint: a co-creative drawing system based on generative adversarial networks, Frontiers of Information Technology & Electronic Engineering, Vol. 20, No. 12, pp. 1644-1656

Chen. S, 2020, Exploration of artistic creation of Chinese ink style painting based on deep learning framework and convolutional neural network model, Soft Computing, Vol. 24, No. 9, pp. 7873-7884

Ikeda. H, Kamiya. T, Aoki. T, 2021, Detection of Abnormal Shadows in Low-dose CT Images Using CNN, Proceedings of International Conference on Artificial Life and Robotics, No. 26, pp. 148-151

Gu. B, Ge. R, Chen. Y, Luo. L, Coatrieux. G, 2021, Automatic and Robust Object Detection in X-Ray Baggage Inspection Using Deep Convolutional Neural Networks, IEEE Transactions on Industrial Electronics, Vol. 68, No. 10, pp. 10248-10257

Albdairi. A, Zhu. X, Alghaili. M, 2020, Identifying Ethnics of People through Face Recognition: a Deep CNN Approach, Scientific Programming, Vol. 2, pp. 1-7

Saba. T, Mohamed. AS, El-Affendi. M, Amin. J, Sharif. M, 2020, Brain tumor detection using fusion of hand crafted and deep learning features, Cognitive Systems Research, Vol. 59, No. 1, pp. 221-230

Baker. N, Lu. H, Erlikhman. G, Kellman. PJ, 2020, Local features and global shape information in object classification by deep convolutional neural networks, Vision Research, Vol. 172, No. 3, pp. 46-61

Author

Yuan Zhong

Yuan Zhong is an associate professor in the Department of Architecture, Sichuan College of Architectural Technology . She received a bachelor's degree from the School of Art of Soochow University in 1995 and a master's degree from the Fine Arts College of Sichuan Normal University in 2008. Her main research direction is art education and art design, and she has published more than 10 academic papers. She has been teaching for more than 20 years,mainly engaged in art, computer aided design teaching work

Xinyan Huang

Xinyan Huang is a lecturer at Shandong University of Finance and Economics. She received her B.A. from Shandong University of Finance and Economic In 2001. She received her M.A. from Ocean University of China in 2006. In 2016, she received her Ph.D from Shandong University. She has published a total of 5 papers. Her research interests are included big data analysis,big data mining and machine learning.

IEIE SPC IEIE Transactions on Smart Processing & Computing

Journal Search

Journal XML

Journal Information

A Painting Style System using an Improved CNN Algorithm

Abstract

Keywords

1. Introduction

2. Related Works

3. Methodological Design

3.1 Model Construction

(1)

Fig. 1. Traditional style-migration model.

Fig. 2. Fast style-rendering model with the improved CNN algorithm.

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

3.2 Front- and Back-end Network Construction

Fig. 3. Flow chart of the style rendering system model.

Fig. 4. Flowchart of the style rendering request module.

Fig. 5. Flow chart of the process pool operation.

Fig. 6. Flow chart of front-end rendering operations.

4. Validation of Fast Style Rendering

Fig. 7. IE values for different pictures from the three models.

Fig. 8. Clarity of images based on AG and PSNR.

Fig. 9. The similarity between images based on MI and SSIM.

Fig. 10. R value and SF value from the three models.

Table 1. Comparison of evaluation metrics for images rendered by the three models.

5. Conclusion

REFERENCES

Author

Yuan Zhong

Xinyan Huang

Article Information (continued)

Keywords

IEIE SPC

IEIE Transactions on Smart Processing & Computing