Lim Younghoon1†
Park Kwanwoo2†
Paik Joonki1,3
-
(Image Processing and Intelligent Systems Laboratory,Graduate School of Advanced Imaging
Science Multimedia)
-
(Samsung Research,Seoul 06765 Korea rhdn4375@gmail.com)
-
(Graduate School of Artificial Intelligence,Chung-Ang University / Seoul 06974 Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Digital cinema, DCP, DCI-P3, Color gamut, CNN
1. Introduction
Generally, people feel different viewing experiences at the theater and home, even
though it is the same movie. On the other hand, existing studies on image quality
have mainly been conducted on image enhancement, such as 8K Ultra High Definition
(UHD) or High Dynamic Range (HDR) [1-5]. To evaluate accurately the quality of images that the consumers feel, "Emotional
Image Quality" is used widely in the display industry and focused mainly on evaluation
methods [6-8], but there have been insufficient studies on how to enhance it. Movie producers and
conservative audiences want to keep the same color and viewing experience as the theater
at home. Therefore, research is needed to realize cinema image quality for the most
popular watching environment.
Streaming movies have formats, such as h.264, AVI, mp4, and MOV. The Digital Cinema
Package (DCP) is a codec for theatrical screening. Streaming Movies and DCP provide
fundamentally different colors [9]. Therefore, it needs to understand the difference between streaming movies and DCP
to provide higher-quality images.
This study analyzed the difference between the DCP and streaming movies and presents
a color conversion method to minimize the color difference. The specific goal of this
work is to provide a cinema-like color that is as similar to the projected movie at
the theater as possible. To solve the problem, the proposed method consists of the
following: color conversion using the Society of Motion Picture and Television Engineers
(SMPTE) standard and ii) end-to-end color mapping using a convolutional neural network
(CNN).
This paper is organized as follows. Section 2 introduces the workflow of digital cinema
with an emphasis on post-production and analyzes the reason for the difference in
image quality. Section 3 proposes a CNN-based color conversion method based on the
SMPTE standard. Section 4 shows experimental results, and section 5 concludes the
paper. The proposed method can help enhance image quality in the display and OTT industries.
2. Related Works
Few studies have conducted learning-based research on the color gamut. H. Lee et al.
studied the implementation of real-time color gamut mapping using a neural network
[10]. Xiaokai Liu et al. studied pose and color-gamut Guided Generative Adversarial Network
for Pedestrian Image Synthesis [11]. Learning-based color enhancement research was considered necessary to realize DCP
image quality. This research is built upon a convolutional neural network (CNN). This
section introduces related works to CNN models and the post-production workflow.
2.1 Convolution Neural Network
A CNN is one of the most significant networks in the deep learning field. Since a
CNN has made impressive achievements in many areas, including but not limited to computer
vision and natural language processing, it has attracted considerable attention from
both industry and academia in the past few years [12].
CNNs are also used widely in the image enhancement area. Chongyi Li et al. proposed
a trainable CNN for weakly illuminated image enhancement [13]. Li Tao et al. proposed a CNN for low-light image enhancement [14]. As mentioned above, CNN-based image quality improvement is often used in low-light
improvement. Chen’s method [15] provides high-quality RGB images from single raw images taken under low-light conditions.
Nevertheless, this method cannot be applied to RGB color images that are not raw images.
Yang et al. [16] proposed a method for enhancing RGB images using two CNNs. This method generates
intermediate HDR images from input RGB images and produces high-quality LDR images.
On the other hand, generating HDR images from single images is a well-known problem
[17,18]. For this reason, the performance of Yang’s method is limited by the quality of the
intermediate HDR images. Therefore, this study improves color enhancement performance
using a network architecture that considers local image information.
2.2 Post Production Workflow
Fig. 1 shows the general post-production process for digital cinema. Post-production processes
for making DCP and streaming movies, from editing to color correction of film, are
the same. On the other hand, the distribution processes are different in the rendering
process and screening display.
Most existing motion picture film content is produced in DCI-P3 format, the most commonly
used color space for Digital Movie Projection [19,20]. Nevertheless, the file format will change depending on how the movie is distributed,
and the color gamut will change accordingly. Generally, films for the theater are
made in DCP format and screened in the DLP Projector, which has the color gamut of
DCI-P3. Films for streaming are produced with an encoding video that supports codecs,
such as AVI, MPEG, and H.264, and has an sRGB color gamut. Generally, most display
devices mainly viewed at home are monitors and televisions. Most of these LED displays
support sRGB, and some recently released products support DCI-P3 [21,22].
DCP can change the color gamut and resolution through metadata, which is a significant
point. This is because the highest color gamut that people can watch in theaters is
DCI-P3.
As shown in Fig. 2, the difference in the color gamut between the DCP and the streaming movie is significant.
sRGB(Rec.709) has a smaller color gamut than DCI-P3. Thus, the color difference between
red and green angular points can cause color problems from insufficient colors. Therefore,
there is a color difference between DCP and streaming movies. The color difference
can have various explanations. Nevertheless, the most crucial factor is the color
gamut. Thus, section 3 introduces the SMPTE standard color gamut conversion and proposes
a color difference minimization method.
Fig. 1. Post Production Workflow for the Digital Cinema.
Fig. 2. Color gamuts: (a) the smallest triangle represents sRGB that is equivalent to BT.709; (b) the triangle in the middle represents DCI-P3; (c) the biggest triangle represents BT.2020.
3. Color Difference Minimization
This section presents an end-to-end learning-based color conversion method to minimize
the color difference between the DCP and streaming movies. The proposed network includes
training and test phases, as shown in Fig. 3.
In the training phase, DCP and streaming movie sequences train the CNN for optimal
end-to-end mapping. The test phase produces two outputs using the CNN and standard
matrix conversion from the input sRGB image. The saturation error in the CNN-based
color converted image is replaced by the SMPTE matrix conversion result using image
fusion.
Fig. 3. Overall architecture of the proposed network: (a) training phase; (b) test phase.
3.1 Standard Color Transformation to Change Color Gamut
SMPTE is an internationally recognized standard developing organization. The color
gamut can be changed using color matrix transform based on ``D-Cinema Quality - Reference
Projector and Environment,'' which is the SMPTE Recommended Practice Paper [23].
According to SMPTE Technical Document, there are two steps to convert from sRGB to
a DCI-P3 color gamut. In the first step, the process of color transformation needs
to convert $R\mathrm{'}G\mathrm{'}B\mathrm{'}$ space to $X'Y'Z\mathrm{'}$ space. Given
a 12-bit input $R\mathrm{'}G\mathrm{'}B\mathrm{'}$ color space of the range [0, 4095],
its gamma-corrected version is obtained by
where $\gamma =2.6$. The gamma-corrected $RGB$ space is represented as a floating-point
format of the range [0, 1], and their values are mapped to the $XYZ$ space using a
3${\times}$3 linear transformation matrix as
The $XYZ$ space is then converted to 12-bit $X'Y'Z\mathrm{'}$ space of range [0, 4095]
as
where $L=48cd/m^{2}$ represents the luminance of the reference white, and $P$ and
$\gamma $ are 52.37 and 2.6, respectively.
The second step, the process of color transformation needs to convert from $X'Y'Z\mathrm{'}$
to DCI-P3. The $X'Y'Z\mathrm{'}$ space is then transformed to DCI-P3 color space using
the following two steps.
where the same set of parameters P, L, and ${\gamma}$ to those in (3) were used, and the DCI-P3 RGB primaries $R_{D}G_{D}B_{D}$ of a cinema projector was
obtained by the SMPTE standard as
The resulting DCI-P3 color image was calculated by the inverse gamma correction applied
to the $R_{D}G_{D}B_{D}$ color.
On the other hand, the ground truth differs from the gamut-converted image from sRGB
to DCI-P3 using the SMPTE standard method. As shown in Fig. 4, the color space converted image is still different from the ground truth DCP image.
The artifact occurred at the boundary of the saturation error between the SMPTE standard-based
converted image (b) and the Ground Truth Image (c) in Fig. 4(d). Therefore, additional color correction step is proposed in the following subsection.
Fig. 4. Comparison of different color spaces: (a) an input image in Srgb; (b) color space converted image using the SMPTE standard; (c) the DCP image in DCI-P3; (d) color difference between (b) and (c).
3.2 End-to-end Learning-based Color Conversion
To minimize the color difference between the converted image and the ground truth
DCP image, as shown in Figs. 4(b) and (c), respectively, this paper presents a locally
adaptive content-based learning approach, as illustrated in Fig. 5.
The proposed method shares a basic framework of the denoising convolutional neural
network (DnCnn) by Zhang et al. [24]. Ten layers with filters of size 3 ${\times}$ 3 were used for training. Each layer
produces 100 feature maps. The proposed method uses the residual image as the label
data. The residual image is estimated by the difference between the color space converted
result and the reference DCI-P3 image.
The objective function of the proposed network uses an $l_{2}$ loss function as
where $f^{\left(n\right)}$ represents the $n$$^{\mathrm{th}}$ ground truth image,
$\hat{g}^{\left(n\right)}$ is the $n$$^{\mathrm{th}}$ result image of the network,
and $N$ is the batch size.
The result of the proposed network exhibits a color saturation error, especially in
the bright region, as shown in Fig. 6(a), which will be taken into account in the following subsection.
Restored DCI-P3 images by standard transformation are a high saturation area that
is clipped. Thus, clipping often occurs if the pixel brightness value is significant,
and the CNN model is trained to increase the value to be corrected. As a side effect,
the saturation error occurs in an area where the brightness is high, but the saturation
is low. The color saturation error was removed using fusion in the following subsection.
Fig. 5. Locally adaptive content-based learning to minimize the difference from the reference DCI-P3 images
3.3 Color Saturation Error Removal using Fusion
The proposed method estimates the binary map of color saturation to remove the color
saturation, which CNN generates, as shown in Fig. 6(b). Because the color saturation is generated in the region with high-intensity values
in HSV color space, the saturated region can be detected using the min channel as
follows:
where the subscript $m$ represents the min channel of each image, and $T\left[\cdot
\right]$ is the binarization operation with an appropriate thresholding value.
The proposed method removes the color saturation using exception handling. Pixels
in the detected saturation region are restored by the estimated DCI-P3 image as
Generally, when compositing images with Computer Graphics (CG) or image processing,
it should be considered that the artifact will occur at the boundary of the saturation
error. The area where the saturation error occurs is a high brightness but a low saturation.
This area is a side effect when the CNN model incorrectly estimates that it has no
values to correct. Because the area where the saturation error occurs did not correct
with the CNN model, there is no problem with the boundary, even if it is replaced
with the SMPTE standard conversion image.
Fig. 6. Color saturation error caused by a CNN.
Table 1. Conventions.
Parameter
|
Note
|
$C$
|
Linearized $R'G'B'$ coded values by gamma correction
|
$C'$
|
Color-graded $R'G'B'$ coded values of DCI-P3 color gamut.
|
$\gamma $
|
Gamma adjusting parameter
|
$W$
|
Linearized $XYZ$ coded values by gamma correction
|
$W'$>
|
Non-linearized $X'Y'Z'$ coded values by inverse gamma correction
|
$P$
|
The peak luminance as shown in the transfer function equation, shall be $52.37cd/m^{2}$
|
$L$
|
Mastering Whites, the Reference Projector at a luminance of $48cd/m^{2}$
|
$l$
|
The loss function of the proposed network
|
$N$
|
Batch size
|
$M$
|
Binary map
|
$T$
|
The binarization operation with an appropriate thresholding value
|
$f$
|
Ground truth DCI-P3 image
|
$g$
|
SMPTE standard matrix-based converted DCI-P3 image from sRGB
|
$\hat{g}$
|
End-to-End Learning-based converted DCI-P3 image
|
$\hat{g}_{m}$
|
Minimum channel of $\hat{g}$
|
$\hat{f}$
|
Final result DCI-P3 image
|
$m$
|
The min channel of each image
|
4. Experimental Results
DCP and streaming movie files of the movie entitled ``Tears of Steel'', which is distributed
under the open license for learning and testing, were used to evaluate the performance
of the proposed method [25]. One hundred and twenty different scene images were extracted from approximately
10,000 frames, and 33,900 patches were extracted from the extracted scene images for
the training data. The patch size 64 ${\times}$ 64, and cropped 128 ${\times}$ 265
patches were set to train the model. The test used 8,000 pieces except for training
data.
This subsection compares the streaming movies, the ground truth DCP, and CNN-based
color-corrected images, as shown in Fig. 7. As shown in the figure, the proposed method produces a similar color to the DCP image
in the subjective sense.
The peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM)
were used for the objective assessments [26,27]. Table 2 lists the results of R, G, and B in Fig. 7 and the average of the test dataset.
From the PSNR and SSIM, the proposed method is better than the SMPTE standard-based
color matrix conversion method from sRGB to DCI-P3.
Fig. 7. Results from test images of Red, Green, and Blue affiliation colors. Comparison of Images: R: Red, G: Green, B: Blue: (a) An Input Image (RGB); (b) Ground Truth (DCP, DCI-P3); (c) Proposed Method (CNN-based Color Conversion, DCI-P3).
Table 2. Comparison of the objective quality assessment using PSNR and SSIM.
Images
|
PSNR
|
SSIM
|
SMPTE
|
Proposed Method
|
SMPTE
|
Proposed Method
|
Fig. 9, R
|
29.9918
|
32.1825
|
0.9211
|
0.9379
|
Fig. 9, G
|
33.5635
|
34.7896
|
0.9482
|
0.9522
|
Fig. 9, B
|
32.4286
|
33.8258
|
0.9419
|
0.9525
|
Average
|
32.1453
|
33.9178
|
0.9408
|
0.9487
|
5. Conclusion
This paper presented a novel color conversion and enhancement method to make the streaming
movie look similar to the DCP shown in the theater. The proposed method consists of
three significant steps: i) SMPTE standard-based color matrix conversion, ii) CNN-based
color conversion, and iii) fusion-based color saturation error removal. The color
conversion performance was compared objectively using the PSNR and SSIM values. The
proposed method improved the result over the SMPTE standard method. The proposed color
correction method can provide an immersive feeling when consumers watch movies through
television or a monitor at home. We could only use the "Tears of Steel," an open license
film for the learning dataset, because a high-quality DCP was obtained difficult,
such to licensing issues. In future research, a DCP dataset will be prepared using
a commercial high-quality movie camera for the learning and test datasets. The learning-based
color enhancement method will be examined using high-quality DCP datasets. Perfect
cinema image quality can be achieved at home through deep learning using the optimal
dataset.
ACKNOWLEDGMENTS
This research was supported by Basic Science Research Program through the National
Research Foundation of Korea (NRF), funded by the Ministry of Education (2022R1I1A1A01071171)
and the Institute of Information & communications Technology Planning & Evaluation
(IITP) grant, which is funded by the Korean government (MSIT) (2021-0-01341, Artificial
Intelligence Graduate School Program (Chung-Ang University))
REFERENCES
Wang Zhihao , Jian Chen , Steven CH Hoi , 2020, Deep learning for image super-resolution:
A survey, IEEE transactions on pattern analysis and machine intelligence, Vol. 43,
No. 10, pp. 3365-3387
Eilertsen Gabriel, 2019, HDR image reconstruction from a single exposure using deep
CNNs, ACM transactions on graphics (TOG), Vol. 36, No. 6, pp. 1-15
Azimi M., Bronner T. F., Nasiopoulos P., Pourazad M. T, 2017, A color gamut mapping
scheme for backward compatible UHD video distribution, 2017 IEEE International Conference
on Communications, pp. 1-5
Kumar R., Assuncao P., Ferreira L., Navarro A., Sep 2018, Retargeting UHD 4k video
for smartphones, 2018 IEEE 8th International Conference on Consumer Electronics-Berlin
(ICCE-Berlin), pp. 1-5
Kim Y., Choi J. S., Kim M., 2018, A real-time convolutional neural network for super-resolution
on FPGA with applications to 4K UHD 60 fps video services, IEEE Transactions on Circuits
and Systems for Video Technology, Vol. 29, No. 8, pp. 2521-2534
Kim W., Yim C., Apr 2022, No-reference Image Contrast Quality Assessment based on
the Degree of Uniformity in Probability Distribution, IEIE Transactions on Smart Processing
and Computing, Vol. 11, No. 2, pp. 85-91
Kim W., Yim C., Apr 2022, No-reference Image Contrast Quality Assessment based on
the Degree of Uniformity in Probability Distribution, IEIE Transactions on Smart Processing
and Computing, Vol. 11, No. 2, pp. 85-91
You J., Mar 2017, Methodologies to improve emotional image qualities by optimizing
technological image quality metrics, Korean Society for Emotion and Sensibility, Vol.
20, No. 1, pp. 57-66
Lim K., Li X., Yan Tu , Mar 2019, Effects of curved computer displays on visual performance,
visual fatigue, and emotional image quality, Journal of the Society for Information
Display, Vol. 27, pp. 543-554
Riedel T., Schnöll M., Sep 2019, Workflow steps to create a digital master format,
2016 IEEE 6th International Conference on Consumer Electronics-Berlin (ICCE-Berlin),
pp. 154-158
Lee H., Han D., 2005, Implementation of real time color gamut mapping using neural
network, Proceedings of the 2005 IEEE Midnight-Summer Workshop on Soft Computing in
Industrial Applications, pp. 138-141
Liu X., Liu X., Li G., Bi S., May 2022, Pose and Color-Gamut Guided Generative Adversarial
Network for Pedestrian Image Synthesis, in IEEE Transactions on Neural Networks and
Learning Systems, pp. 1-13
Li Z., Liu F., Yang W., Peng S., Zhou J., 2021, A Survey of Convolutional Neural Networks:
Analysis, Applications, and Prospects, in IEEE Transactions on Neural Networks and
Learning Systems
Li C., Guo J., Porikli F., Pang Y., 2018, LightenNet: A convolutional neural network
for weakly illuminated image enhancement, Pattern recognition letters, Vol. 104, pp.
15-22
Tao L., Zhu C., Xiang G., Li Y., Jia H., Xie X., 2017, LLCNN: A convolutional neural
network for low-light image enhancement, 2017 IEEE Visual Communications and Image
Processing (VCIP), pp. 1-4
Chen C., Chen Q., Xu J., Koltun V., Jun 2018, Learning to See in the Dark, Proceedings
of IEEE Conference on Computer Vision and Pattern Recognition, pp. 3291-3300
Yang X., Xu K., Song Y., Zhang Q., Wei X., Lau R. W., Jun 2018, Image Correction via
Deep Reciprocating HDR Transformation, Proceedings of IEEE Conference on Computer
Vision and Pattern Recognition, pp. 1798-1807
Eilertsen G., Kronander J., Denes G., Mantiuk R. K., Unger J., Nov 2017, HDR image
reconstruction from a single exposure using deep CNNs, ACM Transactions on Graphics,
Vol. 36, No. 6, pp. 1-15
Kinoshita Y., Shiota S., Kiya H., Mar 2017, Fast inverse tone mapping with Reinhard’s
global operator, Proceedings of IEEE International Conference on Acoustics Speech
and Signal Processing, pp. 1972-1976
Soneira Raymond M., 2016, Display color gamuts: NTSC to Rec. 2020, Information Display,
Vol. 32, No. 4, pp. 26-31
Lee E., Tangirala R., Smith A., Carpenter A., Hotz C., Kim H., Kiyoto I., May. 2018,
41‐5: Invited Paper: Quantum Dot Conversion Layers Through Inkjet Printing, In SID
Symposium Digest of Technical Papers, Vol. 49, No. 1, pp. 525-527
Sharma Abhay., 2019, Understanding RGB color spaces for monitors, projectors, and
televisions, Information Display, Vol. 35, No. 2, pp. 17-43
Zamir S. W., Vazquez-Corral J., Bertalmío M., October 2016, Perceptually-based Gamut
Extension Algorithm for Emerging Wide Color Gamut Display and Projection Technologies,
SMPTE 2016 Annual Technical Conference and Exhibition
SMPTE , Apr. 2011, RP 431-2:2011 - SMPTE Recommended Practice - D-Cinema Quality -
Reference Projector and Environment, The society of motion picture and television
engineers, pp. 1-14
Zhang K., Zuo W., Chen Y., Meng D., Zhang L., 2017, Beyond a gaussian denoiser: Residual
learning of deep cnn for image denoising, IEEE Transactions on Image Processing, Vol.
26, No. 7, pp. 3142-3155
Open Movie , 2012, Tears of Steel, Blender foundation
Poobathy D., Chezian R. Manicka., 2014, Edge detection operators: Peak signal to noise
ratio based comparison, IJ Image, Graphics and Signal Processing, Vol. 10, pp. 55-61
Sara U., Akter M., Uddin M. S., 2019, Image quality assessment through FSIM, SSIM,
MSE and PSNR-a comparative study, Journal of Computer and Communications, Vol. 7,
No. 3, pp. 8-18
Author
Younghoon Lim was born in Seoul, Korea, in 1983. He received his B.S. degree in
media from Sang-Myung University, Korea, in 2007. He received his MFA and Ph.D. in
Film Production and Art and Technology from Chung-Ang University, Korea, in 2010 and
2019, respectively. From 2013 to 2016, he joined BIT computer, where he designed a
healthcare service and trained start-up companies. He is currently an instructor at
Chung-Ang University. His current research interests include filmmaking, the Korean
film industry, and popular culture.
Kwanwoo Park was born in Ulsan, South Korea, in 1994. He received his B.S. and
M.S. degrees in integrative engineering and digital imaging engineering from Chung-Ang
University, Seoul, South Korea, in 2017 and 2019, respectively. From 2019 to 2022,
he was an Assistant research engineer in the PLK technology, where he developed autonomous
driving techniques. He is currently a research engineer in Samsung Research. His current
research interests include deep learning, computer vision, and image enhancement and
restoration for display processing.
Joonki Paik was born in Seoul, South Korea, in 1960. He received his B.S. degree
in control and instrumentation engineering from Seoul National University in 1984
and his M.Sc. and Ph.D. in electrical engineering and computer science from Northwestern
University in 1987 and 1990, respectively. From 1990 to 1993, he joined Samsung Electronics,
where he designed image stabilization chipsets for consumer camcorders. Since 1993,
he has been a faculty member with Chung-Ang University, Seoul, Korea, where he is
currently a Professor with the Graduate School of Advanced Imaging Science, Multimedia,
and Film. From 1999 to 2002, he was a Visiting Professor with the Department of Electrical
and Computer Engineering, The University of Tennessee, Knoxville. Since 2005, he has
been the Director of the National Research Laboratory in image processing and intelligent
systems. From 2005 to 2007, he served as the Dean of the Graduate School of Advanced
Imaging Science, Multimedia, and Film. From 2005 to 2007, he was the Director of the
Seoul Future Contents Convergence Cluster established by the Seoul Research and Business
Development Program. In 2008, he was a full-time Technical Consultant for the System
LSI Division of Samsung Electronics, where he developed various computational photographic
techniques, including an extended depth of field system. He has served as a member
of the Presidential Advisory Board for Scientific/Technical Policy with the Korean
Government. Currently, he serves as a Technical Consultant for the Korean Supreme
Prosecutor's Office for computational forensics. He was a two-time recipient of the
Chester-Sall Award from the IEEE Consumer Electronics Society, the Academic Award
from the Institute of Electronic Engineers of Korea, and the Best Research Professor
Award from Chung-Ang University. He has served the Consumer Electronics Society of
the IEEE as a member of the Editorial Board, Vice President of International Affairs,
and Director of the Sister and Related Societies Committee.