LeeSeungyong1
                     LeeHyokeun1
                     LeeHyuk-Jae1
                     KimHyun2
               
                  - 
                           
                        (Department of Electrical and Computer Engineering, Seoul National University, Seoul,
                        Korea
                        							{sylee, hklee, hyuk_jae_lee}@capp.snu.ac.kr
                        						)
                        
 
                  - 
                           
                        (Department of Electrical and Information Engineering and Research Center for Electrical
                        and Information Technology, Seoul National University of Science and Technology, Seoul,
                        Korea   hyunkim@seoultech.ac.kr )
                        
 
               
             
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Keywords
               
               Non-volatile memory, Phase-change RAM, File system, Storage
             
            
          
         
            
                  1. Introduction
               
                  				DRAM is reaching its limit due to scalability and power consumption problems,
                  and further progress is becoming increasingly difficult. Additionally, applications
                  entailing tremendous memory resources, such as deep learning or big data management,
                  are creating demand for new types of memory [1-3]. As a result, non-volatile memory (NVM) is emerging as a next-generation technology.
                  In particular, phase-change memory (PCM) is a type of NVM that is expected to replace
                  DRAM in the future as it has fast speed close to that of DRAM, high scalability, and
                  low power consumption. PCM typically exhibits performance between that of DRAM and
                  NAND flash memory, so it can be used as both main memory and storage.
                  			
               
               
                  				However, it is problematic to apply PCM without considering its differences from
                  conventional memory devices (DRAM, HDDs, and NAND flash memory). If it is used as
                  main memory, the speed and bandwidth of PCM are lower than that of DRAM, so it is
                  not enough to completely replace DRAM. Significant performance degradation occurs
                  if PCM is used as storage because double-copy from PCM to DRAM occurs when the architecture
                  is composed of a DRAM cache and block driver [4].
                  			
               
               
                  				Furthermore, since PCM can be accessed in byte granularity, applying it directly
                  to an existing file system has a performance limit in fully using the advantages of
                  PCM. Because PCM has different characteristics from existing storage, it is impossible
                  to apply prior knowledge about applications used in storage. This is one of the reasons
                  why many vendors have difficulty finding the target applications of PCM and launching
                  commercial products. Consequently, sufficient analysis of applications suitable for
                  PCM is necessary.
                  			
               
               
                  				Previous studies have focused on the device characteristics of PCM or algorithms
                  to improve its performance [5-10]. Hence, few studies have analyzed the conditions of an operating environment suitable
                  for PCM. There are many studies on PCM-optimized file systems in terms of storage
                  that do not analyze the operating environment and appropriate targets [11-13]. Some studies use PCM in a variety of other ways, including hypervisors and databases,
                  but have not yet been specifically targeted for PCM [14-16].
                  			
               
               
                  				In response to these needs, the performance of PCM as storage was evaluated in
                  this study by running different applications of $\textit{Filebench}$ to obtain workload
                  characteristics that are suitable for PCM. PCM was emulated in a virtual environment
                  and mounted with a direct access (DAX) file system. Therefore, the byte-addressability
                  of PCM can be used, which is the main characteristic that results in differences from
                  conventional devices. With the emulated PCM, workloads were evaluated with varying
                  amounts of files, IO size, and numbers of threads of the workloads.
                  			
               
               
                  				From this evaluation, the proper conditions for PCM were specified, and the differences
                  between PCM and conventional storage (HDDs and SSDs) were determined. PCM outperforms
                  conventional storage if executing workloads with write-intensive properties and significant
                  synchronization operations, which results in up to 500${\times}$ better performance.
                  Based on these observations, an application suitable for PCM is proposed.
                  			
               
               
                  				The rest of this paper is organized as follows. Section 2 presents the background
                  of this research. Section 3 describes the proposed methodologies for workload evaluation.
                  Section 4 shows the emulation environment and evaluation results. Finally, Section
                  5 concludes the study.
                  
                  			
               
             
            
                  2. Background
               
                  
                  			
               
               
                     2.1 Phase-change Memory
                  
                     					PCM stores information by alternating the phase of Ge$_{2}$Sb$_{2}$Te$_{5}$ (GST).
                     The phase of the material is an amorphous state or crystalline state and determines
                     the resistance. It can be changed by applying different temperatures to a cell, so
                     PCM is considered as a representative NVM. It can be highly integrated since it has
                     high scalability without introducing a large number of capacitors. Furthermore, it
                     has byte-granularity and latency similar to DRAM, so it is expected to replace DRAM
                     in the future.
                     
                     				
                  
                
               
                     2.2 Direct Access (DAX)
                  
                     					Page cache is widely used in conventional block device storage. It is a buffer
                     between memory and storage when a system reads or writes files. The latency of storage
                     can be hidden by caching data in page cache (DRAM). However, the use of page cache
                     in a memory-like block device results in degraded performance due to unnecessary copies.
                     Furthermore, the page cache has no persistency, so a system crash hurts the integrity
                     of data in the page cache even if the storage has a non-volatility feature.
                     				
                  
                  
                     					DAX allows an application to access storage directly by load/store commands.
                     Therefore, a file system with DAX capability prevents duplication in the page cache
                     by skipping it. Moreover, DAX provides better performance when applications with a
                     number of small writes or synchronization operations are executed on the target system.
                     
                     				
                  
                
               
                     2.3 Filebench
                  
                     					$\textit{Filebench}$ is a file system and storage benchmark that is widely used
                     in academia and industry [17]. Workload Model Language (WML) is a high-level language defined by $\textit{Filebench}$
                     that makes it easy to change the configuration of workloads, observe the performance
                     of each operation, or even create new workloads in a few lines of code. In this study,
                     $\textit{Filebench}$ was used to test workloads with various characteristics and obtain
                     detailed results easily. Four representative pre-defined workloads were evaluated:
                     Fileserver, Webserver, Webproxy, and Varmail. The features of each workload are as
                     follows:
                     				
                  
                  
                     					· Fileserver: Fileserver is a workload that simulates the operation of a file
                     server and performs various file management commands, such as creating, writing, opening,
                     reading, appending, and deleting files. Other workloads only write by appending, but
                     Fileserver has writewholefile command and appendfile command, so it has a write-intensive
                     feature.
                     				
                  
                  
                     					· Webserver: Webserver is a workload that receives an HTTP request, reads a large
                     amount of files, passes it to the client, and stores access records in a log file.
                     It has a read-intensive feature and no directory operation.
                     				
                  
                  
                     					· Webproxy: Webproxy acts as a normal web proxy server that receives requests
                     from clients and seeks resources from other servers. Reading files is its main feature,
                     and it has many directory operations like createfile and deletefile.
                     				
                  
                  
                     					· Varmail: Varmail is a workload that functions as a mail server that receives
                     mail, reads it, and synchronizes it after marking them read. The difference from other
                     workloads is the fsync command, which is the synchronization command of $\textit{Filebench}$.
                     
                     				
                  
                
             
            
                  3. System Description
               
                  				Experiments were performed in a virtual environment after building a system. Fig. 1 shows an overview of the constructed system. Inside the virtual machine (guest),
                  workloads read and write files to virtual storage. If the workload issues read or
                  write commands, it reaches the storage through the file system.
                  			
               
               
                  				Fig. 1(a) shows the file system layer, where ext4-DAX and ext4 are used for virtual PCM (vPCM)
                  and virtual storage (vSTG), respectively. Because vPCM is mounted with a DAX-enabled
                  file system, there is no guest page cache in vPCM’s file system layer. Therefore,
                  applications can access the storage using load/store commands in byte granularity.
                  Since the file system of vSTG has a guest page cache, read and write operations are
                  executed in the guest page cache in typical cases.
                  			
               
               
                  				Fig. 1(b) presents the virtual device layer of the guest. This layer acts as storage for the
                  guest, which is actually emulated on a file in the host storage. Therefore, commands
                  need to go through the hypervisor layer to access the actual storage. Commands to
                  the guest storage enter the hypervisor layer, arrive at the host storage, and perform
                  read/write operations on the file.
                  			
               
               
                  				The host storage has a file where vPCM and vSTG are emulated. Consequently, the
                  performance of workloads is affected by the speed of accessing the file. Therefore,
                  the type of storage must be the same when comparing vPCM and vSTD with each other
                  to find the benefits of using the DAX feature. This study uses HDDs, SSDs, and DRAMs
                  as host storage to find the advantages of PCM over traditional storage.
                  			
               
               
                  				The performance gap between vPCM and vSTG is mainly caused by the page cache.
                  Because of the DAX feature of vPCM, read and write commands skip the guest page cache
                  and reach the host storage through the hypervisor. A command to vPCM has some overhead
                  since it always goes through the hypervisor layer and accesses the host storage. In
                  contrast to vPCM, read and write commands to vSTG are buffered once in the guest page
                  cache and then reach the storage after eviction.
                  			
               
               
                  				If the guest page cache works well due to high locality, the vSTG can hide its
                  low latency. On the other hand, when the fsync command is used, which flushes the
                  page cache, vPCM’s file system does not need additional work since it does not use
                  the guest page cache. But the fsync command in vSTG actually conducts a flush, causing
                  significant direct access to the storage, leading to much performance degradation.
                  
                  			
               
               
                     Fig. 1. Overview of simulation system (a) File system layer, (b) virtual storage layer.
 
             
            
                  4. Evaluation
               
                     4.1 Simulation Environment
                  
                     					Experiments were done using a virtual machine with Linux on a QEMU emulator [18], which corresponds to the hypervisor in Fig. 1. For easy installation of the library, Fedora 28 (kernel version 4.18) was used.
                     The virtual environment system uses an Intel i7-7800X 3.5-GHz core and 16 GB of DRAM.
                     64-GB PCM (vPCM in Fig. 1) was emulated on a file (storage in Fig. 1) that is located in each DRAM, HDD (WD20EZRZ), and SSD (TS512GSSD230S) according
                     to the experiments. It is recognized in the virtual system using LIBNVDIMM [19], a sub-system that manages NVDIMM devices in Linux, and NDCTL, a library for the
                     user-interface control of LIBNVDIMM [20]. The PCM and conventional storage are mounted with ext4-DAX and ext4, respectively.
                     The host page cache is turned off in all experiments.
                     				
                  
                
               
                     4.2 Workload Configuration
                  
                     					The four $\textit{Filebench}$ workloads mentioned in Section 2 were evaluated.
                     The basic configuration of the workloads modified from previous studies is presented
                     in Table 1 [11,13]. The parameters are configured in a way that preserves the characteristics of each
                     workload. It should be noted that the number of files of Varmail (50K) is half that
                     of other workloads (100K) because Varmail originally has a small file set.
                     				
                  
                  
                     					The experiments were performed by changing the amount of files, I/O size, and
                     number of threads. The effect of the amount of files was tested by comparing the default
                     workloads and the workloads with 10-times fewer files. The change in throughput and
                     latency was observed while increasing the I/O size from 64 B to 16 KB and the number
                     of threads from 1 to 32. Since there is a small variation in each run of a workload,
                     all experiments were done five times.
                     
                     				
                  
                  
                        Table 1. Summary of workload configuration
                     
                           
                              
                                 | 
                                    
                                 									
                                  
                                 								
                                | 
                              
                                    
                                 									
                                  Fileserver 
                                 								
                               | 
                              
                                    
                                 									
                                  Webserver 
                                 								
                               | 
                              
                                    
                                 									
                                  Webproxy 
                                 								
                               | 
                              
                                    
                                 									
                                  Varmail 
                                 								
                               | 
                           
                        
                        
                              
                                 | 
                                    
                                 									
                                  # of files 
                                 								
                               | 
                              
                                    
                                 									
                                  100K 
                                 								
                               | 
                              
                                    
                                 									
                                  100K 
                                 								
                               | 
                              
                                    
                                 									
                                  100K 
                                 								
                               | 
                              
                                    
                                 									
                                  50K 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  File size 
                                 								
                               | 
                              
                                    
                                 									
                                  128KB 
                                 								
                               | 
                              
                                    
                                 									
                                  64KB 
                                 								
                               | 
                              
                                    
                                 									
                                  32KB 
                                 								
                               | 
                              
                                    
                                 									
                                  16KB 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  I/O size (R/W) 
                                 								
                               | 
                              
                                    
                                 									
                                  1MB/16KB 
                                 								
                               | 
                              
                                    
                                 									
                                  1MB/8KB 
                                 								
                               | 
                              
                                    
                                 									
                                  1MB/16KB 
                                 								
                               | 
                              
                                    
                                 									
                                  1MB/16KB 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Directory width 
                                 								
                               | 
                              
                                    
                                 									
                                  20 
                                 								
                               | 
                              
                                    
                                 									
                                  20 
                                 								
                               | 
                              
                                    
                                 									
                                  1M 
                                 								
                               | 
                              
                                    
                                 									
                                  1M 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  Runtime 
                                 								
                               | 
                              
                                    
                                 									
                                  60s 
                                 								
                               | 
                              
                                    
                                 									
                                  60s 
                                 								
                               | 
                              
                                    
                                 									
                                  60s 
                                 								
                               | 
                              
                                    
                                 									
                                  60s 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  # of threads 
                                 								
                               | 
                              
                                    
                                 									
                                  50 
                                 								
                               | 
                              
                                    
                                 									
                                  50 
                                 								
                               | 
                              
                                    
                                 									
                                  50 
                                 								
                               | 
                              
                                    
                                 									
                                  50 
                                 								
                               | 
                           
                           
                                 | 
                                    
                                 									
                                  R/W ratio 
                                 								
                               | 
                              
                                    
                                 									
                                  1:2 
                                 								
                               | 
                              
                                    
                                 									
                                  10:1 
                                 								
                               | 
                              
                                    
                                 									
                                  5:1 
                                 								
                               | 
                              
                                    
                                 									
                                  1:1 
                                 								
                               | 
                           
                        
                     
                   
                
               
                     4.3 Throughput of Default Configurations
                  
                     
                     					Fig. 2(a) presents the $\textit{Filebench}$ throughput of the default setting with 5 different
                     cases: PCM emulated on a file in an HDD (PCM-HDD), PCM emulated on a file in an SSD
                     (PCM-SSD), and PCM emulated on a file in DRAM (PCM-DRAM), an HDD, and an SSD. The
                     experimental results show that PCM-DRAM has the best performance in all workloads,
                     so it can achieve the most similar performance to that of actual PCM devices.
                     				
                  
                  
                     					When comparing the other two PCMs (PCM-HDD and PCM-SSD) with conventional storage
                     (HDD and SSD), the PCMs outperform the conventional storage in the write-intensive
                     environments (Fileserver and Varmail) thanks to the DAX feature. In particular, Varmail
                     shows a significant performance gap because there are many fsync commands that do
                     not have the benefit of the guest page cache. On the other hand, in read-intensive
                     Webserver and Webproxy, the HDD and SSD outperform PCMs because the hypervisor layer
                     overhead becomes notable and the gain of the guest page cache almost reaches the performance
                     of DRAM. From these results, it can be concluded that the DAX feature does not give
                     a performance improvement in read-intensive environments and is suitable for environments
                     with many writes and synchronization commands.
                     
                     				
                  
                  
                        Fig. 2. Throughputs on each Filebench with various memory devices and configurations.
 
                
               
                     4.4 Latency Breakdown of Each Workload
                  
                     
                     					Fig. 3 shows a plot of the latency breakdown of $\textit{Filebench}$ workloads. Overall,
                     the read-intensive workloads have shorter latency than the write-intensive workloads.
                     Fig. 3(a) shows that in Fileserver, the write commands of HDD take 3 times longer than those
                     of PCM-HDD and are responsible for the largest part of the latency. In Varmail, the
                     fsync commands are responsible for a small portion of the total latency of PCM-HDD,
                     but on HDDs, they cause a 200 times longer latency than that of PCM-HDD and are responsible
                     for almost all the latency of the HDD.
                     				
                  
                  
                     					Webserver and Webproxy show similar latency breakdown patterns on both the PCM-HDD
                     and HDD, but the HDD shows better performance for append and readfile commands. When
                     the storage is changed to the SSD, as shown in Fig. 3(b), the whole latency is remarkably reduced because the SSD latency of writefile in
                     Fileserver and that of fsync in Varmail show a significant improvement. In the case
                     of PCM-SSD, the latency of readfile in Webproxy is significantly reduced. From these
                     results, it can be concluded that PCM shows high performance for the writefile and
                     fsync commands, but not for the readfile commands, which can cause many transactions.
                     
                     
                     				
                  
                  
                        Fig. 3. Latency breakdown on each Filebench with various memory devices and configurations.
 
                
               
                     4.5 Experimental Results According to Various Operating Environments
                  
                     
                     					Figs. 2(b), 3(c) and (d) show the experimental results with regard to the amount of files. Small amounts of
                     files improve the performance on most workloads due to their high locality [11], as shown in Fig. 2(b). The change of HDD and SSD in Fileserver is especially notable. There is a significant
                     improvement compared to the default configuration (Fig. 2(a)) because the small number of files can fit in the page cache, causing a decrease
                     of latency from writefile commands.
                     				
                  
                  
                     					On the other hand, as shown in Figs. 3(c) and (d), the latency of all commands is evenly reduced due to the high locality. This shows
                     that PCM has no advantages of a small amount of files because conventional storage
                     has a better throughput improvement in write-intensive environments due to small amounts
                     of files and still shows higher throughput in read-intensive environments. Changing
                     the directory width leads to similar results as changing the amount of files.
                     				
                  
                  
                     					Fig. 4 shows the measured throughput with regard to the I/O size. The variation of latency
                     can be inferred by taking the inverse of the throughput. In all devices, larger I/O
                     size causes more pre-fetch effects and fewer transactions, resulting in better performance,
                     but the throughput converges around 4K. Webserver, Webproxy, and Varmail present this
                     tendency clearly. On Webserver, which has many read operations with high locality,
                     convergence happens a bit late. This result shows that the I/O size of PCM should
                     be set to 4K based on the convergence of the throughput.
                     				
                  
                  
                     					Fig. 5 presents the average latency under a varying number of threads. It shows the latency
                     on a log scale in order to show the correlation between the latency increase and the
                     number of threads. Like in the previous experiments, the variation of throughput can
                     be inferred by taking the inverse of the latency. For most workloads, the latency
                     increases proportionally to the number of threads, which means increasing the number
                     of threads does not help to improve the performance due to additional latency.
                     				
                  
                  
                     					In Varmail, the slope of the graphs of HDD and SSD is very gradual because the
                     guest page cache can be flushed simultaneously on HDDs or SSDs. Therefore, their latency
                     is more tolerable to the growing number of threads compared to PCMs. However, even
                     if this tolerance is present, it should be noted that the absolute value of the latency
                     is significantly higher than that of PCM’s latency. Therefore, it can be concluded
                     that PCM has no advantages over conventional storage, depending on the number of threads.
                     
                     				
                  
                  
                        Fig. 4. Throughput of Filebench with varying I/O size.
 
                  
                        Fig. 5. Latency of Filebench with varying the number of threads.
 
                
               
                     4.6 Application Proposal
                  
                     					The experiments demonstrate that PCM performs better than traditional storage
                     on workloads with the following characteristics: many synchronization commands, write-intensive
                     commands, low locality, and 4-KB I/O size. Hence, applications that write a small-size
                     unit in a large amount of data and perform frequent metadata updates are proposed
                     as a target for PCM.
                     				
                  
                
             
            
                  5. Conclusion 
               
                  				In this study, $\textit{Filebench}$ workloads with various configurations were
                  evaluated on a PCM-aware file system. The workload characteristics suitable for PCM
                  were found by comparing the performance of each configuration. The simulation results
                  showed that PCM performs well for workloads with a number of write or synchronization
                  operations. Based on the observations, target properties suitable for PCM were proposed,
                  which could be a great help in the development and use of PCM. The results are expected
                  to contribute significantly to the commerciali-zation of PCM.
                  			
               
             
          
         
            
                  ACKNOWLEDGMENTS
               
                  				This paper was supported in part by the Technology Innovation Program (10080613,
                  DRAM/PRAM heterogeneous memory architecture and controller IC design technology research
                  and development) funded by the Ministry of Trade, Industry & Energy (MOTIE), Korea,
                  and in part by a National Research Foundation of Korea (NRF) grant funded by the Korea
                  government (MSIT) (No. 2019R1F1A1057530).
                  			
               
             
            
                  
                     REFERENCES
                  
                     
                        
                        Kim B., et al. , Apr. 2020, PCM: Precision-Controlled Memory System for Energy Efficient
                           Deep Neural Network Training, in Proc. 2020 Design, Automation & Test in Europe Conference
                           & Exhibition (DATE), pp. 1199-1204

 
                      
                     
                        
                        Nguyen D. T., Hung N. H., Kim H., Lee H.-J., May. 2020, An Approximate Memory Architecture
                           for Energy Saving in Deep Learning Applications, IEEE Transactions on Circuits and
                           Systems for Video Technology, Vol. 67, No. 5, pp. 1588-1601

 
                      
                     
                        
                        Lee C., Lee H., Feb. 2019, Effective Parallelization of a High-Order Graph Matching
                           Algorithm for GPU Execution, IEEE Transactions on Circuits and Systems for Video Technology,
                           Vol. 29, No. 2, pp. 560-571

 
                      
                     
                        
                        Kim M., Chang I., Lee H., 2019, Segmented Tag Cache: A Novel Cache Organization for
                           Reducing Dynamic Read Energy, IEEE Transactions on Computers, Vol. 68, No. 10, pp.
                           1546-1552

 
                      
                     
                        
                        Jiang L., Zhang Y., Yang J., 2014, Mitigating write disturbance in super-dense phase
                           change memories, in Proc. 2014 44th Annual IEEE/IFIP International Conference on Dependable
                           Systems and Networks, pp. 216-227

 
                      
                     
                        
                        Lee H., Kim M., Kim H., Kim H., Lee H., 1 Dec. 2019, Integration and Boost of a Read-Modify-Write
                           Module in Phase Change Memory System, IEEE Transactions on Computers, Vol. 68, No.
                           12, pp. 1772-1784

 
                      
                     
                        
                        Kim S., Jung H., Shin W., Lee H., Lee H.-J., 2019, HAD-TWL: Hot Address Detection-Based
                           Wear Leveling for phase-change memory systems with low latency, IEEE Computer Architecture
                           Letters, Vol. 18, No. 2, pp. 107-110

 
                      
                     
                        
                        Wang R., et al. , 2017, Decongest: Accelerating super-dense PCM under write disturbance
                           by hot page remapping., IEEE Computer Architecture Letters, Vol. 16, No. 2, pp. 107-110

 
                      
                     
                        
                        Lee H., Jung H., Lee H., Kim H., 2020, Bit-width Reduction in Write Counters for Wear
                           Leveling in a Phase-change Memory System., IEIE Transactions on Smart Processing &
                           Computing, Vol. 9, No. 5, pp. 413-419

 
                      
                     
                        
                        Kim M., Lee J., Kim H., Lee. H., Jan 2020, An On-Demand Scrubbing Solution for Read
                           Disturbance Error in Phase-Change Memory., in Proc. 2020 International Conference
                           on Electronics, Information, Communication (ICEIC), pp. 110-111

 
                      
                     
                        
                        Xu J., Swanson S., 2016, NOVA: A log-structured file system for hybrid volatile/non-volatile
                           main memories, in Proc. 14th Usenix Conference on File and Storage Technologies, pp.
                           323-338

 
                      
                     
                        
                        Ou J., Shu J., Lu Y., 2016, A high performance file system for non-volatile main memory,
                           in Proc. Eleventh European Conference on Computer Systems, No. 12, pp. 1-16

 
                      
                     
                        
                        Dong M., Chen H., 2017, Soft updates made simple and fast on non-volatile memory,
                           in Proc. 2017 USENIX Annual Technical Conference (USENIX ATC 17), pp. 719-731

 
                      
                     
                        
                        Liang L., et al. , 2016, A case for virtualizing persistent memory, in Proc. Seventh
                           ACM Symposium on Cloud Computing (SOCC 2016), pp. 126-140

 
                      
                     
                        
                        Mustafa N. U., Armejach A., Ozturk O., Cristal A., Unsal O. S., 2016, Implications
                           of non-volatile memory as primary storage for database management systems, in Proc.
                           2016 International Conference on Embedded Computer Systems: Architectures, Modeling
                           and Simulation (SAMOS), pp. 164-171

 
                      
                     
                        
                        Wu C., Zhang G., Li K., 2016, Rethinking computer architectures and software systems
                           for phase-change memory, ACM Journal on Emerging Technologies in Computing Systems
                           (JETC), Vol. 12, No. 4, pp. 1-40

 
                      
                     
                        
                        Tarasov V., Zadok E., Shepler S., 2016, Filebench: A flexible framework for file system
                           benchmarking, login: The USENIX Magazine, Vol. 41, No. 1, pp. 6-12

 
                      
                     
                        
                        Bellard F., 2005, QEMU, a fast and portable dynamic translator, in Proc. USENIX Annual
                           Technical Conference, FREENIX Track, Vol. 41, pp. 46

 
                      
                     
                        
                        LIBNVDIMM: Non Volatile Devices

 
                      
                     
                        
                        NDCTL

 
                      
                   
                
             
            Author
            
            
               			Seungyong Lee received a B.S. degree in electrical and computer engineering from
               Seoul National University, Seoul, South Korea, in 2018. He is currently working toward
               integrated M.S. and Ph.D. degrees in electrical and computer engineering at Seoul
               National University. His current research interests include non-volatile memory controller
               design, processing-in-memory, and computer architecture.
               		
            
            
            
               			Hyokeun Lee received a B.S. degree in electrical and computer engineering from
               Seoul National University, Seoul, South Korea, in 2016, where he is currently working
               toward integrated M.S. and Ph.D. degrees in electrical and computer engineering. His
               current research interests include nonvolatile memory controller design, hardware
               persistent models for non-volatile memory, and computer architecture.
               		
            
            
            
               			Hyuk-Jae Lee received B.S. and M.S. degrees in electronics engineering from Seoul
               National University, Seoul, South Korea, in 1987 and 1989, respectively, and a Ph.D.
               degree in electrical and computer Engineering from Purdue University, West Lafayette,
               IN, USA, in 1996. From 1998 to 2001, he was a Senior Component Design Engineer at
               the Server and Workstation Chipset Division, Intel Corporation, Hillsboro, OR, USA.
               From 1996 to 1998, he was a Faculty Member at the Department of Computer Science,
               Louisiana Tech University, Ruston, LS, USA. In 2001, he joined the School of Electrical
               Engineering and Computer Science, Seoul National University, where he is a Professor.
               He is the Founder of Mamurian Design, Inc., Seoul, a fabless SoC design house for
               multimedia applications. His current research interests include computer architecture
               and SoC for multimedia applications.
               		
            
            
            
               			Hyun Kim received B.S., M.S. and Ph.D. degrees in electrical engineering and computer
               science from Seoul National University, Seoul, South Korea, in 2009, 2011, and 2015,
               respectively. From 2015 to 2018, he was a BK Assistant Professor at the BK21 Creative
               Research Engineer Development for IT, Seoul National University. In 2018, he joined
               the Department of Electrical and Information Engineering, Seoul National University
               of Science and Technology, Seoul, where he is an Assistant Professor. His current
               research interests include algorithms, computer architecture, memory, and SoC design
               for low-complexity multimedia applications, and deep neural networks.