Mobile QR Code

1. (Department of Computer Science, Sangmyung University / Seoul, 20 Hongjimun 2-gil Jongno-gu, Korea 202032027@sangmyung.kr, sukang@smu.ac.kr )

Screen capture, Secure image display, Object detection, Screen shot, Virtual fence

## 1. Introduction

The global content market is growing every year. The analog market represented by the printing segment is shrinking, and most content is consumed as digital content [1]. Due to this phenomenon, copyright holders have begun to recognize the importance of copyright protection of digital contents. Some blogs, SNS services, and websites apply download or copy protection technology to prevent unauthorized copying of text and images. When digital resources such as characters and stock images are sold online, copyright infringement may occur through screen capture or screen photography, even if an image copy protection function is implemented in an Internet shopping mall.

Various copy protection techniques have been developed to protect digital resources presented on Internet browsers. Lee [2] devised a proactive control method to delete clipboard content whenever a computer user creates new clipboard content via "ctrl+c" or "print screen" key strokes. Background processes in the Windows operating system can detect those specific key strokes by catching the event-driven messages generated by key stroke events. The reactive control method [3] is applied to an SNS application such that photo owners are notified when their photos are copied by others. A "fingerprint" embedded in digital content has also been used to monitor copyright infringement during content distribution [4].

However, most digital rights management technologies are vulnerable to illegal copying through taking images with cameras, which is called the "analog hole". Many digital content shopping malls do not have adequate countermeasures against the analog hole attack. Hou et al. [5] divided a binary-scale image into two parts : the object and the background. The whole image is made up of random binary patterns. The difference between the object and the background is the flickering rate of the binary pattern. The pattern belonging to the content part blinks slowly so that a white and black pattern is clearly visible, while the pattern within the background part flickers fast enough to appear at a constant gray level to human eyes due to the afterimage effect. As a result, the analog hole attack can only steal random meaningless patterns, not the content. Ji et al. [6] statistically encrypted the area to be protected on a pixel-by-pixel basis and then decrypted it with polarized glasses. This scheme can be applied to polarized 3D systems and is extensible to choose from restricted viewing or selective viewing. However, the method [6] assumes that a TV display uses an interlaced image and both clockwise and counterclockwise circular polarizing filters.

Yamamoto et al. [7] proposed a multi-color encryption technique using red, green, blue, magenta, cyan, yellow, black, and white colors. A displayed image and the corresponding decoding mask pair share secret image information, and the displayed image and mask are all random patterns. The secret image is only visible if the displayed image and the mask pair are overlaid and separated by a certain distance, so a screen capture attack can only obtain a random pattern of the displayed image. Later on, Yamamoto et al. [8] extended their decryption mask method to limit the viewing angle so that an attacker standing aside cannot see the screen and take a screenshot. Other works [7,8] are based on a visual cryptographic scheme devised by Noar and Shamir [9], which is a kind of visual secret sharing technique that divides a secret image into N multiple shares. Each share makes up some information, and if K shares out of N stack together the secret can be revealed. The single secret image protection method was extended to multiple secret methods [10,11]. Multiple secrets are hidden in a single display image, and multiple decoding masks are used to decrypt the visual secrets.

Yovo [12] introduced flickering and nonoverlapped virtual fences superimposed over an image to be protected. If the flickering rate is fast enough, the fences will block the image when photographed with a camera, but the image will be visible to the human eye with slight degradation due to the afterimage effect. Park et al. [13] examined the correlation between visual quality and motion of the virtual fence. They found that the flickering rate is proportional to the perceived image quality and that the fence area is inversely proportional after taking into account the modeled afterimage effect [14].

The virtual fence method [12,13] can block an analog hole attack at the cost of image quality degradation and is a little bit annoying because of the blinking fence. Therefore, reducing such cost while maintaining the capability to block analog hole attacks is important and meaningful. Accordingly, this study deals with the issue [13]. In this study, we propose a new virtual fence generation method that considers the differential protection value from the viewpoint of copyright. In other words, only the objects in an image are protected instead of applying the virtual fence to the entire image. Nguyen et al. [15-17] detected animation and cartoon characters based on a deep learning network to improve copyright protection, so we used a deep neural network to identify critical regions in an image and partially apply virtual fences to the identified regions in order to minimize perceived image quality degradation. The contribution of the proposed method is summarized as follows.

$\textbf{·}$ It detects protectable objects in an image and identifies the regions in which virtual fences are applied.

$\textbf{·}$ It merges multiple regions to generate a protection zone so that virtual fence generation rules are met.

The paper is organized as follows. Section 2 describes works related to a virtual fence, afterimage modeling, and object detection. Section 3 presents the real-time protection method against analog hole attacks. Section 4 shows the simulation results. Finally, we conclude this paper in the last section.

## 2. Related Works

First, the $\textit{secure image display}$ (SID) technology using a virtual fence is described in order to make it easier to understand the proposed virtual fence generation scheme. Then, we cover the fast object detection method to show how the protection zones are determined.

### 2.1 Secure Image Display

According to a study [13], one image to be protected by the SID should produce PN $\textit{grille images}$ (GNs) with non-overlapping virtual fences. The generated GNs are played back sequentially and repeatedly so that the image is securely displayed on a display screen. PN is a playback period that reduces design complexity to generate GNs. The virtual fence should be generated to conform to the following four rules. First, the PN value should be greater than or equal to 2. Second, during the PN period, the union of the image area uncovered by fences should be the whole image. Third, the current fence area should not be overlapped with the previous and next fence areas in order to use the afterimage effect. Fourth, if the combined area of the fence and the space between the fence is multiplied by an integer M, it should become the entire image area. The last rule makes the fence design much easier.

The rationale of using a virtual fence is the afterimage effect, which has been mathematically modelled [14]. Even though the model does not precisely describe the complex human visual system, the factors that influence the quality of the image perceived by the human eye are at least revealed.

### 2.2 Object Detection

The object detection technique is necessary under the assumption that the value of copyright protection in an image is focused on objects rather than the background. This assumption is reasonable because most animation characters are not treated as background. Object detection has to solve two main problems: object classification and localization. Some region-based CNN (R-CNN) algorithms [20-22] use two stages: localization first and then classification. These two-stage detectors have high classification accuracy, but the region proposal consumes much computational power, mainly due to the complexity of the selective search method.

Single-stage detectors [18,19] (also known as the YOLO family) try to simulate a human visual system that performs both classification and localization at the same time. R-CNN methods generally detect an object more accurately, but they tend to have slower speed and higher background errors than YOLO algorithms. In this study, object detection was performed using YOLO because it is important to detect the content part, which is the rest of the background, rather than an exact object classification. Also, the fast operation of YOLO is suitable for a real-time application. The output of a YOLO network is rectangular bounding boxes with a label. If only one object is detected, applying a virtual fence method would not be difficult. However, there may exist multiple objects and bounding boxes frequently overlapping each other. Virtual fence generation rules [12] are highly unlikely to be satisfied due to this situation. Therefore, we propose a novel concept of protection zone, which can be determined based on object bounding boxes.

## 3. The Proposed Method

The overall proposed method is illustrated in Fig. 1. Object bounding boxes are generated after the input image passes through the YOLO network. In this study, we propose a protection zone for applying a virtual fence by merging and adjusting each bounding box. After properly merging the bounding boxes based on the box coordinate information, the protected zones are localized. Finally, the protection zones are adjusted so that the virtual fence can be applied while conforming to the virtual fence generation rules. The new virtual fence generation rules should be introduced when multiple protection zones are detected.

### 3.1 Protection Zone Localization

The simple protection idea is that virtual fences are applied independently to each bounding box detected by an object detection method. However, objects in an image are mostly not separated from each other, so it is very likely that the bounding boxes surrounding the objects overlap each other. If virtual fences are separately applied to individual bounding boxes, the overlapped area cannot satisfy the virtual fence generation rules summarized in Section 2.1. This causes severe distortion of the perceived image quality inside the overlapped area because some of the overlapped area can be too frequently occluded by fences. The objective of this study is not to detect objects in an image, but to protect a valuable part of an image from copyright infringement, so we need to define a rectangular protection zone that is bounded based on bounding boxes.

To create a protection zone, the overlapped bounding boxes should be repeatedly merged until there are no overlapping regions. Two overlapped bounding boxes are merged to make a temporary protection zone in the process of finding the final protection zones. The temporary zone should contain all merged bounding box areas and should be rectangular in shape. Therefore, some background area can be included in that temporary protection zone. This process can cause a temporary protection zone to overlap with bounding boxes that did not originally overlap, which requires another merging between them. Such examples are shown in Figs. 3(a) and (b) and will be described in more detail later. The localization process of final protection zones is done when there are no more intersections between bounding boxes and temporary protection zones.

A bounding box $b_{n}$ can be represented with two coordinates, $(b_{n}x_{1},b_{n}y_{1})$ and $(b_{n}x_{2},b_{n}y_{2})$, which are the top-left and bottom-right points of the box, respectively. Similarly, the temporary protection zone $p_{n}$ can be represented with the top-left point $(p_{n}x_{1},p_{n}y_{1})$ and the bottom-right point $(p_{n}x_{2},p_{n}y_{2}).$ The final protection zones are represented by $p_{fn}.$

The localization of a protection zone can be divided into three steps as follows. First, we check whether there are two overlapping bounding boxes. If any, two bounding boxes $b_{1}$ and $b_{2}$ should be merged so that the temporary protection zone $p_{1}$ is created and is represented using Eq. (1). Second, after creating $p_{1}$, which contains $b_{1}$ and $b_{2}$, $b_{2}$ is deleted in order to mark it as "already inclusive." Third, the temporary protection zone $p_{1}$should be checked for whether it is overlapped with $b_{3}-b_{N}$. If overlapped, $p_{1}$ continues to merge overlapping bounding boxes. Otherwise, the temporary protection zone becomes the final protection zone. A single protection zone is created after this process. If there are still overlapping bounding boxes, another protected area can be generated.

##### (1)
$$\left(p_{1}x_{1},p_{1}y_{1}\right)=\left(\min \left(b_{1}x_{1},b_{2}x_{1}\right),\,\,\min \left(b_{1}y_{1},b_{2}y_{1}\right)\right)$$ $\left(p_{1}x_{2},p_{1}y_{2}\right)=\left(\max \left(b_{1}x_{2},b_{2}x_{2}\right),\,\,\max \left(b_{1}y_{2},b_{2}y_{2}\right)\right)$

The detailed algorithm is shown in Fig. 2. We define a list $\alpha$ of $b_{n}$ detected in a single image. The list is in the form of a Python dictionary. The list $\alpha$ contains the labels $b_{n}$ and the corresponding coordinates. A (key, value) pair of the label and its corresponding coordinate is represented by $O_{n}$.

##### (2)
$O_{n}= (label, coordinate)$
##### (3)
$\alpha$=$\left\{O_{1},O_{2},\ldots ,O_{N}\right\}$

The overall structure goes through the process of updating $O_{n}$ to the temporary protection zone until there is no more overlap. Localization is the merge process between each $O_{n}$ and is described as follows. The first $b_{1}$ is copied to $p_{1}$. $p_{1}$ is created in a list named $\beta$ to be used as a temporary protection zone. This is done because when nothing overlaps until the end of $\alpha$, $b_{1}$ becomes a temporary protection zone $p_{1}$ by itself without any further operation. $p_{1}$ sequentially searches for whether or not it overlaps from $O_{2}$ to $O_{N}$ inside $\alpha$. At this time, self-comparison with one's own dictionary is excluded. If $p_{1}$ overlaps $b_{n}$, we update the coordinates of $p_{1}$ through Eq. (1). $b_{n}$ is removed from $\alpha$ to avoid duplication in the next search in the loop. This means that as the localization process occurs, the size of $\alpha$ decreases in most cases. $p_{1}$ continues the search process from the next element $b_{n+1}$. If $p_{1}$ goes through the loop to the end of $\alpha$, the previous search process repeats recursively starting with the next unremoved element on the list, $b_{2}$. When the recursive function is finished, the $\beta$ that stored $p_{n}$ is updated to $\alpha$. Now, $\alpha$ consists of $p_{1}-$$p_{n}. If there are still overlapping elements in \alpha , the localization function is repeated again. Finally, if there are no more overlapping boxes, \alpha consisting of p_{f1}-$$p_{fn}$ becomes the output.

An example of the protection zone localization algorithm is illustrated in Fig. 3. Fig. 3(a) shows six bounding boxes $b_{1}-b_{6}$ detected by the YOLO network. Initially, $b_{2}$ and $b_{5}$ overlap, and so do $b_{3}$ and $b_{6}$, so they need to be merged later on. The subscripts are numbered in the order the bounding box that is stored in the list $\alpha$. $p_{1}$ is a temporary protection zone where nothing one overlaps with others during the search process.

A protection zone $p_{2}$ begins to be created in Fig. 3(c) by merging $b_{2}$ and $b_{5}$ according to the algorithm in Fig. 2. In this case, $p_{2}$ merges because it shares some area with $p_{1}$, which originally did not overlap. In Fig. 3(d), $b_{3}$ finds $b_{6}$ overlapping, and $p_{3}$ is generated by merging with it. $b_{4}$ becomes the temporary protection zone $p_{4}$ by itself without any further operation, as illustrated in Fig. 3(e). $p_{2}$ extends to another temporary protection zone, including the object of $p_{1}$. In this case, $p_{f1}$ becomes the final protection zone since there are no further overlapping boxes or protection zones. Also, $p_{3}$ is the final protection zone $p_{f2}$. Similarly, $p_{4}$ becomes the final protection zone $p_{f3}$ by itself without any further operation, as illustrated in Fig. 3(f).

The proposed protection zone localization algorithm was applied to digital animation resources, and some examples are shown in Fig. 4. We discarded the labeling information from the YOLO output and used only the bounding box information. The complex and various boxes that existed in the first place were neatly arranged to create an area worth protection, which shows that the protection zone localization algorithm is effective. There are some problems with the final protection zones, so we need protection zone adjustment, which will be discussed in the next section.

##### Fig. 2. Protection Zone Localization Algorithm.

After localization of the protection zones, virtual fences must be generated inside them to create a protected image. The virtual fence generation rules summarized in Section 2.1 can also be applied to protection zones with some additional rules. Therefore, we can create virtual fences that move at the same playback rate and exposure rate in every protection zone. The need for additional rules arises from the difference between the entire image protection method and the adaptive image protection method proposed in this study.

We found three additional problems with using virtual fences for protection zones in all situations. First, there is a small protection zone problem, as shown in Fig. 4(a). There is a possibility that the size of the detected object is very small. This is not a problem if the small bounding boxes are merged in the process of protection zone localization. Otherwise, applying a virtual fence to these small protection zones increases image distortion with little to no protection benefit. Since the purpose of this study is to protect valuable parts of the image, we remove protection zones that do not reach the appropriate size. The threshold size is set to 10\% of the image size.

Second, there is a variable protection area problem, as shown in Figs. 4(a)-(c). While the virtual fence generation rules are applied to fixed size images, the protection zone size is variable. The image size used in another study [13] was deliberately predetermined so that the entire image area is equal to the sum of the fence area and the area between the fences multiplied by the integer M=5. As a result, the virtual fence generation rule is hard to follow in protection zones of varying sizes. Of course, the solution is to increase the size of the protection zone so that the fixed exposure rate and PN can be physically applicable conforming to the fence generation rules. The size is also increased so that the valuable area is not exposed by the modification. However, if one protected zone meets another during the change, it must be reduced in size as an exception. One thing we need to remember is that we do not have to change the height of the protection zone because the fence generation rules do not control the fence height

Third, there is a multiple-protection-zone problem, as shown in Fig. 4(c). The way to cope with this problem is to ensure the same image quality in all protection zones. The image quality depends on the playback rate and exposure rate. Since the playback rate affects every protection zone, we do not have to do anything regarding this. However, the size of the protection zone should be adjusted to keep the same exposure rate in all protection zones. The largest protection zone is selected, and its size is changed according to the solution of the second problem. For other protection zones, the fences are applied using the ratio of the largest zone width to the width of the other zones. The anchor point of the size change is the upper left coordinate of the protection zone, so the direction of increasing or decreasing smaller protection zones is on the right side. If it reaches the end of the image, the protection zone is virtually expanded. The virtually expanded area may have virtual fences, but they are not actually visible on the screen. Some results are shown in Fig. 5.

## 4. Simulation Results

The simulation environment is summarized as follows. Open-source software [24] was used to implement the object detection based on YOLO in darknet [23]. The YOLO software uses Python 2, TensorFlow 1.0, NumPy, and OpenCV 3. The protection zone localization is implemented in Python as well. The perceived visual quality after applying the virtual fence was objectively measured with the peak signal-to-noise ratio (PSNR) using an afterimage simulation method [13].

The data setup is the following. Five animation images were collected via the Internet from "Tangled," "Frozen," "Aladdin," "Crayon Shin-chan," and "Pokémon," as shown in Fig. 5. Among them, the Crayon image contains only one object, and the remaining images include multiple objects with overlapped or non-overlapped bounding boxes. All images were resized to 700${\times}$500 grayscale images to reduce the simulation time and complexity. Two types of virtual fence were used, a black one and a blur one. The blur fence is made by Gaussian filtering the original image to enhance the image quality.

The performance was compared to another study [13] by setting the virtual fence parameters as follows: exposure rate = 9 and playback rate = 30. Recognized images (RI) were generated at intervals of 0.001 seconds from$~ t=0$ to $t=1$ second, resulting in a total of 1,000 RIs. The PSNR values were calculated by comparing the resized original image with each RI over the simulation period ($t=0,..,1)$. Simulation results are shown in Fig. 6.

The performance of the proposed method is always better than the other method’s [13] over the entire period. The image quality continuously and periodically varies as time elapses due to virtual fences and the afterimage effect. The same simulation was performed for all test images, resulting in an average of 2.87-dB higher PSNR for the black fence and 2.59-dB higher PSNR for the blur fence. These results are summarized in Tables 1 and 2. Also, examples of recognized images at 0.573 seconds are shown in Fig. 5.

##### Table 1. Comparison of image quality (black fence).
 Data PSNR(dB) [13] Proposed Tangled 18.03 20.33 Frozen 17.12 18.90 Aladdin 15.73 17.92 Crayon Shin-chan 14.03 17.76 Pokémon 15.32 19.69 Average 16.05 18.92
##### Table 2. Comparison of image quality (blur fence).
 Data PSNR(dB) [13] Proposed Tangled 39.13 40.84 Frozen 35.48 40.11 Aladdin 35.66 36.37 Crayon Shin-chan 35.26 37.36 Pokémon 32.98 36.79 Average 35.70 38.29

## 5. Conclusion

In this study, we have proposed a new scheme for an adaptive secure image protection method based on a deep object detection network in accordance with the principle of taking measures only where necessary for a screen display. The best simulation result was 38.29 dB, which is 2.59 dB higher than in previous work. The displayed image may show flickering to the human eye, which can cause dizziness. However, the improvement of the image quality and reduction of the virtual fence coverage achieved in this experiment can greatly reduce the fatigue for the human eye.

### ACKNOWLEDGMENTS

This research was funded by a 2020 research Grant from Sangmyung University.

### REFERENCES

1
2018 the market analysis of overseas contents, KOCCA.
2
Lee J., 2014, Implementation of anti-screen capture modules for privacy protection, Journal of the Korea Institute of Information and Communication Engineering, Vol. 18, No. 1, pp. 91-96
3
Charteris J., Gregory S., Masters Y., 2014, Snapchat ‘selfies’: The case of disappearing data, eds.) Hegarty, B., McDonald, j., & Loke, S. K., Rhetoric and Reality: Crit. Perspect. Educ. Tehnol., pp. 389-393
4
Kim W., Lee S., Seo Y., 2006, Image fingerprinting scheme for print-and-capture model, Pacific-Rim Conference on Multimedia. Springer Berlin Heidelberg, pp. 106-113
5
Hou J., Kim D., Song H., Lee H., 2016, Secure Image Display through Visual Cryptography: Exploiting Temporal Responsibilities of the Human Eye, in Proc. of the 4th ACM Workshop on Information Hiding and Multimedia Security, ACM, pp. 169-174
6
Ji S., Lee H., 2018, Image Recapture Prevention Using Secure Display Schemes on Polarized 3D System, IEEE Trans. Circuits Syst. Video Technol., pp. 2296-2309
7
Yamamoto H., Hayasaki Y., Nishida N., 2004, Secure information display with limited viewing zone by use of multi-color visual cryptography, Opt. Express, Vol. 12, No. 7, pp. 1258-1270
8
Yamamoto H., Hayasaki Y., Secure display that limits the viewing space by use of optically decodable encryption, in Advanced Optical and Quantum Memories and Computing IV, vol. 6482. International Society for Optics and Photonics, 2007, p. 64820C.
9
Naor M., Shamir A., 1994, Visual Cryptography, Advances in Cryptography-EUROCRYPT’94, Lecture Notes in Computer Science 950, pp. 1-12, Springer-Verlag
10
Shyu S. J., Chen M.-C., Chao K.-M., 2009, Securing information display for multiple secrets, Optical Engineering, Vol. 48, No. 5, pp. 057005
11
Yamamoto H., Suyama S., 2011, Secure display by use of multiple decoding masks based on visual cryptography, in Industry Applications Society Annual Meeting (IAS), 2011 IEEE, pp. 1-5
12
Yovo
13
Park S., Kang S., 2017, Visual Quality Optimization for Privacy Protection Bar-based Secure Image Display Technique, KSII Trans. Inf Syst., pp. 3664-3677
14
Brettel H., Shi L., Strasburger H., 2006, Temporal image fusion in human vision, Vision Res., Vol. 46, No. 6, pp. 774-781
15
Nguyen N-V., Rigaud C., Burie J-C., Comic characters detection using deep learning, In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9-15 November 2017, pp. 41-46
16
Khan F. S., Anwer R. M., van de Weijer J., Bagdanov A. D., Vanrell M., Lopez A. M., 2012, Color attributes for object detection., in CVPR., pp. 3306-3313
17
Zheng Y., Zhao Y., Ren M., Yan H., Lu X., Liu J., Li J., 2019, Cartoon face recognition: A benchmark dataset., arXiv:1907.13394.
18
Redmon J., Divvala S., Girshick R., Farhadi. A., 201, You only look once: Unified, real-time object detection., arXiv preprint arXiv:1506.02640
19
Redmon J., Farhadi. A., 2017, Yolo9000: Better, faster, stronger., In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 6517-6525. IEEE
20
Girshick R., Donahue J., Darrell T., Malik. J., 2014, Rich feature hierarchies for accurate object detection and semantic segmentation., In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 580-587. IEEE
21
Girshick. R. B., 2015, Fast R-CNN., CoRR, abs/1504.08083
22
Ren S., He K., Girshick R., Sun. J., 2015, Faster r-cnn: Towards real-time object detection with region proposal networks., arXiv preprint arXiv:1506.01497
23
Redmon J., Darknet: Open source neural networks in c, 2013-2016
24
Trieu. Darkflow

## Author

##### Jinwoo Kang

Jinwoo Kang received his B.S. degree in Computer Science in 2020 from Sangmyung University, Korea Rep. of. He is currently a Master degree candidate in the Department of Computer Science, Sangmyung University, Seoul Korea. His research interests include multimedia security, reversible data hiding and deep neural networks.

##### Sang-ug Kang

Sang-ug Kang is a professor in the department of computer science, Sangmyung University, Seoul, Korea. He received his M.S. degree in Electrical Engineering in 1995 from University of Southern California and Ph.D. degree in Information Security in 2011 from Korea University. His research interests include multimedia security, information security, and artificial intelligence.