KangJinwoo
KangSang-ug*
-
(Department of Computer Science, Sangmyung University / Seoul, 20 Hongjimun 2-gil Jongno-gu,
Korea 202032027@sangmyung.kr, sukang@smu.ac.kr
)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
Screen capture, Secure image display, Object detection, Screen shot, Virtual fence
1. Introduction
The global content market is growing every year. The analog market represented
by the printing segment is shrinking, and most content is consumed as digital content
[1]. Due to this phenomenon, copyright holders have begun to recognize the importance
of copyright protection of digital contents. Some blogs, SNS services, and websites
apply download or copy protection technology to prevent unauthorized copying of text
and images. When digital resources such as characters and stock images are sold online,
copyright infringement may occur through screen capture or screen photography, even
if an image copy protection function is implemented in an Internet shopping mall.
Various copy protection techniques have been developed to protect digital resources
presented on Internet browsers. Lee [2] devised a proactive control method to delete clipboard content whenever a computer
user creates new clipboard content via "ctrl+c" or "print screen" key strokes. Background
processes in the Windows operating system can detect those specific key strokes by
catching the event-driven messages generated by key stroke events. The reactive control
method [3] is applied to an SNS application such that photo owners are notified when their photos
are copied by others. A "fingerprint" embedded in digital content has also been used
to monitor copyright infringement during content distribution [4].
However, most digital rights management technologies are vulnerable to illegal
copying through taking images with cameras, which is called the "analog hole". Many
digital content shopping malls do not have adequate countermeasures against the analog
hole attack. Hou et al. [5] divided a binary-scale image into two parts : the object and the background. The
whole image is made up of random binary patterns. The difference between the object
and the background is the flickering rate of the binary pattern. The pattern belonging
to the content part blinks slowly so that a white and black pattern is clearly visible,
while the pattern within the background part flickers fast enough to appear at a constant
gray level to human eyes due to the afterimage effect. As a result, the analog hole
attack can only steal random meaningless patterns, not the content. Ji et al. [6] statistically encrypted the area to be protected on a pixel-by-pixel basis and then
decrypted it with polarized glasses. This scheme can be applied to polarized 3D systems
and is extensible to choose from restricted viewing or selective viewing. However,
the method [6] assumes that a TV display uses an interlaced image and both clockwise and counterclockwise
circular polarizing filters.
Yamamoto et al. [7] proposed a multi-color encryption technique using red, green, blue, magenta, cyan,
yellow, black, and white colors. A displayed image and the corresponding decoding
mask pair share secret image information, and the displayed image and mask are all
random patterns. The secret image is only visible if the displayed image and the mask
pair are overlaid and separated by a certain distance, so a screen capture attack
can only obtain a random pattern of the displayed image. Later on, Yamamoto et al.
[8] extended their decryption mask method to limit the viewing angle so that an attacker
standing aside cannot see the screen and take a screenshot. Other works [7,8] are based on a visual cryptographic scheme devised by Noar and Shamir [9], which is a kind of visual secret sharing technique that divides a secret image into
N multiple shares. Each share makes up some information, and if K shares out of N
stack together the secret can be revealed. The single secret image protection method
was extended to multiple secret methods [10,11]. Multiple secrets are hidden in a single display image, and multiple decoding masks
are used to decrypt the visual secrets.
Yovo [12] introduced flickering and nonoverlapped virtual fences superimposed over an image
to be protected. If the flickering rate is fast enough, the fences will block the
image when photographed with a camera, but the image will be visible to the human
eye with slight degradation due to the afterimage effect. Park et al. [13] examined the correlation between visual quality and motion of the virtual fence.
They found that the flickering rate is proportional to the perceived image quality
and that the fence area is inversely proportional after taking into account the modeled
afterimage effect [14].
The virtual fence method [12,13] can block an analog hole attack at the cost of image quality degradation and is a
little bit annoying because of the blinking fence. Therefore, reducing such cost while
maintaining the capability to block analog hole attacks is important and meaningful.
Accordingly, this study deals with the issue [13]. In this study, we propose a new virtual fence generation method that considers the
differential protection value from the viewpoint of copyright. In other words, only
the objects in an image are protected instead of applying the virtual fence to the
entire image. Nguyen et al. [15-17] detected animation and cartoon characters based on a deep learning network to improve
copyright protection, so we used a deep neural network to identify critical regions
in an image and partially apply virtual fences to the identified regions in order
to minimize perceived image quality degradation. The contribution of the proposed
method is summarized as follows.
$\textbf{·}$ It detects protectable objects in an image and identifies the regions
in which virtual fences are applied.
$\textbf{·}$ It merges multiple regions to generate a protection zone so that
virtual fence generation rules are met.
The paper is organized as follows. Section 2 describes works related to a virtual
fence, afterimage modeling, and object detection. Section 3 presents the real-time
protection method against analog hole attacks. Section 4 shows the simulation results.
Finally, we conclude this paper in the last section.
2. Related Works
First, the $\textit{secure image display}$ (SID) technology using a virtual fence
is described in order to make it easier to understand the proposed virtual fence generation
scheme. Then, we cover the fast object detection method to show how the protection
zones are determined.
2.1 Secure Image Display
According to a study [13], one image to be protected by the SID should produce PN $\textit{grille images}$
(GNs) with non-overlapping virtual fences. The generated GNs are played back sequentially
and repeatedly so that the image is securely displayed on a display screen. PN is
a playback period that reduces design complexity to generate GNs. The virtual fence
should be generated to conform to the following four rules. First, the PN value should
be greater than or equal to 2. Second, during the PN period, the union of the image
area uncovered by fences should be the whole image. Third, the current fence area
should not be overlapped with the previous and next fence areas in order to use the
afterimage effect. Fourth, if the combined area of the fence and the space between
the fence is multiplied by an integer M, it should become the entire image area. The
last rule makes the fence design much easier.
The rationale of using a virtual fence is the afterimage effect, which has been
mathematically modelled [14]. Even though the model does not precisely describe the complex human visual system,
the factors that influence the quality of the image perceived by the human eye are
at least revealed.
2.2 Object Detection
The object detection technique is necessary under the assumption that the value
of copyright protection in an image is focused on objects rather than the background.
This assumption is reasonable because most animation characters are not treated as
background. Object detection has to solve two main problems: object classification
and localization. Some region-based CNN (R-CNN) algorithms [20-22] use two stages: localization first and then classification. These two-stage detectors
have high classification accuracy, but the region proposal consumes much computational
power, mainly due to the complexity of the selective search method.
Single-stage detectors [18,19] (also known as the YOLO family) try to simulate a human visual system that performs
both classification and localization at the same time. R-CNN methods generally detect
an object more accurately, but they tend to have slower speed and higher background
errors than YOLO algorithms. In this study, object detection was performed using YOLO
because it is important to detect the content part, which is the rest of the background,
rather than an exact object classification. Also, the fast operation of YOLO is suitable
for a real-time application. The output of a YOLO network is rectangular bounding
boxes with a label. If only one object is detected, applying a virtual fence method
would not be difficult. However, there may exist multiple objects and bounding boxes
frequently overlapping each other. Virtual fence generation rules [12] are highly unlikely to be satisfied due to this situation. Therefore, we propose
a novel concept of protection zone, which can be determined based on object bounding
boxes.
Fig. 1. Block diagram of overall proposed object-wise secure image display method.
3. The Proposed Method
The overall proposed method is illustrated in Fig. 1. Object bounding boxes are generated after the input image passes through the YOLO
network. In this study, we propose a protection zone for applying a virtual fence
by merging and adjusting each bounding box. After properly merging the bounding boxes
based on the box coordinate information, the protected zones are localized. Finally,
the protection zones are adjusted so that the virtual fence can be applied while conforming
to the virtual fence generation rules. The new virtual fence generation rules should
be introduced when multiple protection zones are detected.
3.1 Protection Zone Localization
The simple protection idea is that virtual fences are applied independently to
each bounding box detected by an object detection method. However, objects in an image
are mostly not separated from each other, so it is very likely that the bounding boxes
surrounding the objects overlap each other. If virtual fences are separately applied
to individual bounding boxes, the overlapped area cannot satisfy the virtual fence
generation rules summarized in Section 2.1. This causes severe distortion of the perceived
image quality inside the overlapped area because some of the overlapped area can be
too frequently occluded by fences. The objective of this study is not to detect objects
in an image, but to protect a valuable part of an image from copyright infringement,
so we need to define a rectangular protection zone that is bounded based on bounding
boxes.
To create a protection zone, the overlapped bounding boxes should be repeatedly
merged until there are no overlapping regions. Two overlapped bounding boxes are merged
to make a temporary protection zone in the process of finding the final protection
zones. The temporary zone should contain all merged bounding box areas and should
be rectangular in shape. Therefore, some background area can be included in that temporary
protection zone. This process can cause a temporary protection zone to overlap with
bounding boxes that did not originally overlap, which requires another merging between
them. Such examples are shown in Figs. 3(a) and (b) and will be described in more
detail later. The localization process of final protection zones is done when there
are no more intersections between bounding boxes and temporary protection zones.
A bounding box $b_{n}$ can be represented with two coordinates, $(b_{n}x_{1},b_{n}y_{1})$
and $(b_{n}x_{2},b_{n}y_{2})$, which are the top-left and bottom-right points of the
box, respectively. Similarly, the temporary protection zone $p_{n}$ can be represented
with the top-left point $(p_{n}x_{1},p_{n}y_{1})$ and the bottom-right point $(p_{n}x_{2},p_{n}y_{2}).$
The final protection zones are represented by $p_{fn}.$
The localization of a protection zone can be divided into three steps as follows.
First, we check whether there are two overlapping bounding boxes. If any, two bounding
boxes $b_{1}$ and $b_{2}$ should be merged so that the temporary protection zone $p_{1}$
is created and is represented using Eq. (1). Second, after creating $p_{1}$, which
contains $b_{1}$ and $b_{2}$, $b_{2}$ is deleted in order to mark it as "already inclusive."
Third, the temporary protection zone $p_{1}$should be checked for whether it is overlapped
with $b_{3}-b_{N}$. If overlapped, $p_{1}$ continues to merge overlapping bounding
boxes. Otherwise, the temporary protection zone becomes the final protection zone.
A single protection zone is created after this process. If there are still overlapping
bounding boxes, another protected area can be generated.
The detailed algorithm is shown in Fig. 2. We define a list $\alpha $ of $b_{n}$ detected in a single image. The list is in
the form of a Python dictionary. The list $\alpha $ contains the labels $b_{n}$ and
the corresponding coordinates. A (key, value) pair of the label and its corresponding
coordinate is represented by $O_{n}$.
The overall structure goes through the process of updating $O_{n}$ to the temporary
protection zone until there is no more overlap. Localization is the merge process
between each $O_{n}$ and is described as follows. The first $b_{1}$ is copied to $p_{1}$.
$p_{1}$ is created in a list named $\beta $ to be used as a temporary protection zone.
This is done because when nothing overlaps until the end of $\alpha $, $b_{1}$ becomes
a temporary protection zone $p_{1}$ by itself without any further operation. $p_{1}$
sequentially searches for whether or not it overlaps from $O_{2}$ to $O_{N}$ inside
$\alpha $. At this time, self-comparison with one's own dictionary is excluded. If
$p_{1}$ overlaps $b_{n}$, we update the coordinates of $p_{1}$ through Eq. (1). $b_{n}$
is removed from $\alpha $ to avoid duplication in the next search in the loop. This
means that as the localization process occurs, the size of $\alpha $ decreases in
most cases. $p_{1}$ continues the search process from the next element $b_{n+1}$.
If $p_{1}$ goes through the loop to the end of $\alpha $, the previous search process
repeats recursively starting with the next unremoved element on the list, $b_{2}$.
When the recursive function is finished, the $\beta $ that stored $p_{n}$ is updated
to $\alpha $. Now, $\alpha $ consists of $p_{1}-$$p_{n}$. If there are still overlapping
elements in $\alpha $, the localization function is repeated again. Finally, if there
are no more overlapping boxes, $\alpha $ consisting of $p_{f1}-$$p_{fn}$ becomes the
output.
An example of the protection zone localization algorithm is illustrated in Fig. 3. Fig. 3(a) shows six bounding boxes $b_{1}-b_{6}$ detected by the YOLO network. Initially, $b_{2}$
and $b_{5}$ overlap, and so do $b_{3}$ and $b_{6}$, so they need to be merged later
on. The subscripts are numbered in the order the bounding box that is stored in the
list $\alpha $. $p_{1}$ is a temporary protection zone where nothing one overlaps
with others during the search process.
A protection zone $p_{2}$ begins to be created in Fig. 3(c) by merging $b_{2}$ and $b_{5}$ according to the algorithm in Fig. 2. In this case, $p_{2}$ merges because it shares some area with $p_{1}$, which originally
did not overlap. In Fig. 3(d), $b_{3}$ finds $b_{6}$ overlapping, and $p_{3}$ is generated by merging with it.
$b_{4}$ becomes the temporary protection zone $p_{4}$ by itself without any further
operation, as illustrated in Fig. 3(e). $p_{2}$ extends to another temporary protection zone, including the object of $p_{1}$.
In this case, $p_{f1}$ becomes the final protection zone since there are no further
overlapping boxes or protection zones. Also, $p_{3}$ is the final protection zone
$p_{f2}$. Similarly, $p_{4}$ becomes the final protection zone $p_{f3}$ by itself
without any further operation, as illustrated in Fig. 3(f).
The proposed protection zone localization algorithm was applied to digital animation
resources, and some examples are shown in Fig. 4. We discarded the labeling information from the YOLO output and used only the bounding
box information. The complex and various boxes that existed in the first place were
neatly arranged to create an area worth protection, which shows that the protection
zone localization algorithm is effective. There are some problems with the final protection
zones, so we need protection zone adjustment, which will be discussed in the next
section.
Fig. 2. Protection Zone Localization Algorithm.
3.2 Protection Zone Adjustment
After localization of the protection zones, virtual fences must be generated
inside them to create a protected image. The virtual fence generation rules summarized
in Section 2.1 can also be applied to protection zones with some additional rules.
Therefore, we can create virtual fences that move at the same playback rate and exposure
rate in every protection zone. The need for additional rules arises from the difference
between the entire image protection method and the adaptive image protection method
proposed in this study.
We found three additional problems with using virtual fences for protection zones
in all situations. First, there is a small protection zone problem, as shown in Fig. 4(a). There is a possibility that the size of the detected object is very small. This
is not a problem if the small bounding boxes are merged in the process of protection
zone localization. Otherwise, applying a virtual fence to these small protection zones
increases image distortion with little to no protection benefit. Since the purpose
of this study is to protect valuable parts of the image, we remove protection zones
that do not reach the appropriate size. The threshold size is set to 10\% of the image
size.
Second, there is a variable protection area problem, as shown in Figs. 4(a)-(c).
While the virtual fence generation rules are applied to fixed size images, the protection
zone size is variable. The image size used in another study [13] was deliberately predetermined so that the entire image area is equal to the sum
of the fence area and the area between the fences multiplied by the integer M=5. As
a result, the virtual fence generation rule is hard to follow in protection zones
of varying sizes. Of course, the solution is to increase the size of the protection
zone so that the fixed exposure rate and PN can be physically applicable conforming
to the fence generation rules. The size is also increased so that the valuable area
is not exposed by the modification. However, if one protected zone meets another during
the change, it must be reduced in size as an exception. One thing we need to remember
is that we do not have to change the height of the protection zone because the fence
generation rules do not control the fence height
Third, there is a multiple-protection-zone problem, as shown in Fig. 4(c). The way to cope with this problem is to ensure the same image quality in all protection
zones. The image quality depends on the playback rate and exposure rate. Since the
playback rate affects every protection zone, we do not have to do anything regarding
this. However, the size of the protection zone should be adjusted to keep the same
exposure rate in all protection zones. The largest protection zone is selected, and
its size is changed according to the solution of the second problem. For other protection
zones, the fences are applied using the ratio of the largest zone width to the width
of the other zones. The anchor point of the size change is the upper left coordinate
of the protection zone, so the direction of increasing or decreasing smaller protection
zones is on the right side. If it reaches the end of the image, the protection zone
is virtually expanded. The virtually expanded area may have virtual fences, but they
are not actually visible on the screen. Some results are shown in Fig. 5.
Fig. 3. An example of protection zone localization.
Fig. 4. Three final protection zone examples.
Fig. 5. Sample from Table 1 at 0.573 seconds.
4. Simulation Results
The simulation environment is summarized as follows. Open-source software [24] was used to implement the object detection based on YOLO in darknet [23]. The YOLO software uses Python 2, TensorFlow 1.0, NumPy, and OpenCV 3. The protection
zone localization is implemented in Python as well. The perceived visual quality after
applying the virtual fence was objectively measured with the peak signal-to-noise
ratio (PSNR) using an afterimage simulation method [13].
The data setup is the following. Five animation images were collected via the
Internet from "Tangled," "Frozen," "Aladdin," "Crayon Shin-chan," and "Pokémon," as
shown in Fig. 5. Among them, the Crayon image contains only one object, and the remaining images
include multiple objects with overlapped or non-overlapped bounding boxes. All images
were resized to 700${\times}$500 grayscale images to reduce the simulation time and
complexity. Two types of virtual fence were used, a black one and a blur one. The
blur fence is made by Gaussian filtering the original image to enhance the image quality.
The performance was compared to another study [13] by setting the virtual fence parameters as follows: exposure rate = 9 and playback
rate = 30. Recognized images (RI) were generated at intervals of 0.001 seconds from$~
t=0$ to $t=1$ second, resulting in a total of 1,000 RIs. The PSNR values were calculated
by comparing the resized original image with each RI over the simulation period ($t=0,..,1)$.
Simulation results are shown in Fig. 6.
The performance of the proposed method is always better than the other method’s
[13] over the entire period. The image quality continuously and periodically varies as
time elapses due to virtual fences and the afterimage effect. The same simulation
was performed for all test images, resulting in an average of 2.87-dB higher PSNR
for the black fence and 2.59-dB higher PSNR for the blur fence. These results are
summarized in Tables 1 and 2. Also, examples of recognized images at 0.573 seconds are shown in Fig. 5.
Table 1. Comparison of image quality (black fence).
Data
|
PSNR(dB)
|
[13]
|
Proposed
|
Tangled
|
18.03
|
20.33
|
Frozen
|
17.12
|
18.90
|
Aladdin
|
15.73
|
17.92
|
Crayon Shin-chan
|
14.03
|
17.76
|
Pokémon
|
15.32
|
19.69
|
Average
|
16.05
|
18.92
|
Table 2. Comparison of image quality (blur fence).
Data
|
PSNR(dB)
|
[13]
|
Proposed
|
Tangled
|
39.13
|
40.84
|
Frozen
|
35.48
|
40.11
|
Aladdin
|
35.66
|
36.37
|
Crayon Shin-chan
|
35.26
|
37.36
|
Pokémon
|
32.98
|
36.79
|
Average
|
35.70
|
38.29
|
Fig. 6. Image quality from Table 2 data at various times.
5. Conclusion
In this study, we have proposed a new scheme for an adaptive secure image protection
method based on a deep object detection network in accordance with the principle of
taking measures only where necessary for a screen display. The best simulation result
was 38.29 dB, which is 2.59 dB higher than in previous work. The displayed image may
show flickering to the human eye, which can cause dizziness. However, the improvement
of the image quality and reduction of the virtual fence coverage achieved in this
experiment can greatly reduce the fatigue for the human eye.
ACKNOWLEDGMENTS
This research was funded by a 2020 research Grant from Sangmyung University.
REFERENCES
2018 the market analysis of overseas contents, KOCCA.
Lee J., 2014, Implementation of anti-screen capture modules for privacy protection,
Journal of the Korea Institute of Information and Communication Engineering, Vol.
18, No. 1, pp. 91-96
Charteris J., Gregory S., Masters Y., 2014, Snapchat ‘selfies’: The case of disappearing
data, eds.) Hegarty, B., McDonald, j., & Loke, S. K., Rhetoric and Reality: Crit.
Perspect. Educ. Tehnol., pp. 389-393
Kim W., Lee S., Seo Y., 2006, Image fingerprinting scheme for print-and-capture model,
Pacific-Rim Conference on Multimedia. Springer Berlin Heidelberg, pp. 106-113
Hou J., Kim D., Song H., Lee H., 2016, Secure Image Display through Visual Cryptography:
Exploiting Temporal Responsibilities of the Human Eye, in Proc. of the 4th ACM Workshop
on Information Hiding and Multimedia Security, ACM, pp. 169-174
Ji S., Lee H., 2018, Image Recapture Prevention Using Secure Display Schemes on Polarized
3D System, IEEE Trans. Circuits Syst. Video Technol., pp. 2296-2309
Yamamoto H., Hayasaki Y., Nishida N., 2004, Secure information display with limited
viewing zone by use of multi-color visual cryptography, Opt. Express, Vol. 12, No.
7, pp. 1258-1270
Yamamoto H., Hayasaki Y., Secure display that limits the viewing space by use of optically
decodable encryption, in Advanced Optical and Quantum Memories and Computing IV, vol.
6482. International Society for Optics and Photonics, 2007, p. 64820C.
Naor M., Shamir A., 1994, Visual Cryptography, Advances in Cryptography-EUROCRYPT’94,
Lecture Notes in Computer Science 950, pp. 1-12, Springer-Verlag
Shyu S. J., Chen M.-C., Chao K.-M., 2009, Securing information display for multiple
secrets, Optical Engineering, Vol. 48, No. 5, pp. 057005
Yamamoto H., Suyama S., 2011, Secure display by use of multiple decoding masks based
on visual cryptography, in Industry Applications Society Annual Meeting (IAS), 2011
IEEE, pp. 1-5
Yovo
Park S., Kang S., 2017, Visual Quality Optimization for Privacy Protection Bar-based
Secure Image Display Technique, KSII Trans. Inf Syst., pp. 3664-3677
Brettel H., Shi L., Strasburger H., 2006, Temporal image fusion in human vision, Vision
Res., Vol. 46, No. 6, pp. 774-781
Nguyen N-V., Rigaud C., Burie J-C., Comic characters detection using deep learning,
In Proceedings of the 2017 14th IAPR International Conference on Document Analysis
and Recognition (ICDAR), Kyoto, Japan, 9-15 November 2017, pp. 41-46
Khan F. S., Anwer R. M., van de Weijer J., Bagdanov A. D., Vanrell M., Lopez A. M.,
2012, Color attributes for object detection., in CVPR., pp. 3306-3313
Zheng Y., Zhao Y., Ren M., Yan H., Lu X., Liu J., Li J., 2019, Cartoon face recognition:
A benchmark dataset., arXiv:1907.13394.
Redmon J., Divvala S., Girshick R., Farhadi. A., 201, You only look once: Unified,
real-time object detection., arXiv preprint arXiv:1506.02640
Redmon J., Farhadi. A., 2017, Yolo9000: Better, faster, stronger., In Computer Vision
and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 6517-6525. IEEE
Girshick R., Donahue J., Darrell T., Malik. J., 2014, Rich feature hierarchies for
accurate object detection and semantic segmentation., In Computer Vision and Pattern
Recognition (CVPR), 2014 IEEE Conference on, pages 580-587. IEEE
Girshick. R. B., 2015, Fast R-CNN., CoRR, abs/1504.08083
Ren S., He K., Girshick R., Sun. J., 2015, Faster r-cnn: Towards real-time object
detection with region proposal networks., arXiv preprint arXiv:1506.01497
Redmon J., Darknet: Open source neural networks in c, 2013-2016
Trieu. Darkflow
Author
Jinwoo Kang received his B.S. degree in Computer Science in 2020 from Sangmyung
University, Korea Rep. of. He is currently a Master degree candidate in the Department
of Computer Science, Sangmyung University, Seoul Korea. His research interests include
multimedia security, reversible data hiding and deep neural networks.
Sang-ug Kang is a professor in the department of computer science, Sangmyung University,
Seoul, Korea. He received his M.S. degree in Electrical Engineering in 1995 from University
of Southern California and Ph.D. degree in Information Security in 2011 from Korea
University. His research interests include multimedia security, information security,
and artificial intelligence.