Mobile QR Code

1. (Department of AI Convergence, Pukyong National University, Pusan 48513, Korea {kdm902077, sengthai}@pukyong.ac.kr )
2. (Department of Computer Engineering, Pukyong National University, Pusan 48513, Korea youngsun@pknu.ac.kr )

Qubit mapping problem, Parallel processing, SABRE algorithm

## 1. Introduction

Quantum computers are the next-generation computers that store and calculate data based on quantum mechanics [1,2]. Quantum devices can process computationally intensive operations in a significantly shorter time compared to classical computers by utilizing quantum mechanics, such as superposition and entanglement. The D-Wave company [3] released quantum systems that are being commercialized and surpass classical supercomputers using the quantum-annealing algorithm [4,5]. On the other hand, since the D-wave was designed only for the quantum-annealing algorithm, the current technology still has many problems developing an actual quantum machine on the noisy intermediate-scale quantum (NISQ)-scale to be commercialized universally soon [6]. Qubit, the basic unit constituting a quantum computer, and the quantum gates constituting a quantum algorithm all have inherent errors. Therefore, two major problems should be solved to produce a fault-tolerant and universal quantum computer on the NISQ-scale. The first to be addressed is quantum error correction, which solves the errors inherent in qubit and quantum gate [7,8]. Currently, various studies on quantum error-correction codes have been conducted [9-13]. Nevertheless, thousands of qubits are required to make a fault-tolerant qubit in a real quantum machine [14]. A considerable number of qubits are required for a meaningful quantum algorithm on the NISQ-scale. Second, the quantum circuit must be executed quickly within the quantum coherence time [15]. The quantum coherence time refers to the time at which the information of the qubits of an actual quantum machine can be preserved without change. In particular, the same result as the simulation can be obtained if the quantum algorithm is executed successfully within the quantum coherence time on an actual quantum machine. Currently, there is no real quantum machine capable of supporting a general-purpose quantum circuit on the NISQ-scale that can satisfy the two major problems. Thus, quantum computer simulations must be utilized. Quantum simulation platforms, such as Qiskit [16], QuTiP [17], and t|ket> [18], provide several useful tools to implement, verify quantum algorithms, and evaluate their performance. In the case of Qiskit, the NISQ-scale quantum algorithm implemented by the programmer can be operated using a simulator with the same characteristics as the actual quantum machine. Consequently, it allows a researcher to implement and analyze quantum circuits quickly.

With the recent developments of quantum computer technology, various quantum algorithms have been proposed. Quantum algorithms are implemented as quantum circuits based on quantum gates, and a quantum compiler is required to run these quantum circuits on a quantum machine. Quantum qubit mapping is the core function of a quantum compiler that maps logical qubits to physical qubits. The main difference between these two qubits is the qubit connectivity, and the states in which the qubits are connected are different. In a logical qubit, a random qubit is connected to all other qubits. Thus, all the logical qubits are connected to each other. In a physical qubit, however, only specific qubits are connected to each other. Qubit connectivity is important because a two-quantum gate cannot be executed directly if a two-quantum gate to be executed in a real quantum machine is applied to qubits that are not connected. Qubit mapping is required to solve this problem. On the other hand, qubit mapping is an NP-complete problem [26]. Thus, the compilation time increases exponentially with increasing number of qubits. Therefore, the execution time of qubit mapping should be reduced to decrease the compilation time.

This paper proposes a novel qubit mapping algorithm that utilizes the existing qubit mapping algorithm called SABRE [19]. The SABRE algorithm minimizes the number of quantum gates added due to qubit mapping. By running the SABRE algorithm in parallel, this study focused on improving the speedup of the execution time of qubit mapping. For performance verification, 10 NISQ-scale quantum circuits were used as benchmarks to measure the execution time of qubit mapping compared to the SABRE algorithm. According to the experimental results, compared to SABRE, the proposed method has an overhead of up to 1.05x and an average of 1.03x in terms of the gate, resulting in improvements of up to 6.24 times with an average of 4.8 times in terms of speedup.

The major contributions of this paper can be summarized as follows:

• This study analyzed the characteristics of previous studies for the qubit-mapping algorithm and identified the goals considered essential for qubit mapping.

• A new method was proposed to run the SABRE algorithm in parallel to reduce the compilation time of quantum compilers.

• The scalability of the proposed method was shown; increasing the number of cores during parallelization will outperform the well-known qubit-mapping algorithm for NISQ-scale quantum circuits.

The remainder of this paper is organized as follows. Section 2 describes the Qubit mapping problem. In Section 3, the proposed qubit-mapping algorithm is explained, and the performance of the method is verified in Section 4. Section 5 discusses the issues that should be resolved in future research, and Section 6 concludes the paper.

## 2. Background and Related Works

This section explains Qubit mapping, which is the focus of the study. In addition, the contents necessary for each paper are mentioned, and the work performed in this study is outlined by explaining recent studies to solve the Qubit mapping problem effectively.

### 2.1 Qubit Mapping Problem

Running a quantum circuit using simulation is not the same as running in a real quantum machine. Qubits used in quantum simulations are called logical qubits, while those used in actual quantum machines are called physical qubits. These two qubits run the quantum circuit under different conditions, and the condition considered in this study is the Qubit connectivity.

Qubit connectivity contains information on how different qubits are connected and can be expressed in a graph called a coupling map. In the case of logical qubits, because they are all connected to each other, the coupling map is not considered when executing the quantum circuit on a quantum simulator. By contrast, because not all physical qubits are connected to each other, the quantum circuit is executed in consideration of the coupling map. Fig. 1 shows the coupling map of a real quantum machine, IBM Q Tokyo. In the coupling map of IBM Q Tokyo, the circles indicate physical qubits, and straight arrows indicate that physical qubits are connected to each other. For example, qubits 6 and 7 are connected to each other, but qubits 6 and 12 are not. Hence, there are limitations in operating the quantum circuit because the physical qubits of the actual quantum machine are only connected to specific qubits.

##### Fig. 1. Coupling map of a 20-qubit IBM Tokyo quantum machine.

A quantum circuit consists of several qubits and quantum gates. Quantum gates can be divided into single-qubit gates and multi-qubit gates. When executing a quantum circuit composed of these quantum gates in an actual quantum machine, it is necessary to check whether multi-qubit gates can be executed without additional work. That is, if the physical qubits to which the multi-qubit gates are applied are not connected to each other, these multi-qubit gates cannot be executed directly on an actual quantum machine. To solve this problem, it is necessary to connect the physical qubits that are separated. On the other hand, because the location of the physical qubit cannot be changed, the logical qubit must first be mapped to the physical qubit on the coupling map. Some of the logical qubits are separated from each other and some are connected according to the coupling map. To connect the separated physical qubits, it is necessary to connect the logical qubits mapped to the corresponding physical qubits. To this end, many SWAP gates that change the positions of the logical qubits are added to the original quantum circuit. Accordingly, the number of gates in the quantum circuit increases, which increases the execution time of qubit mapping. That is, to alleviate this problem, it is necessary to minimize the total number of gates using only a minimum number of SWAP gates.

Fig. 2 shows two different quantum circuits in which arbitrary quantum circuits can be produced through qubit mapping: (a) is a coupling map composed of four qubits, and (b) is a quantum circuit composed of four qubits and six quantum gates. If qubit mapping is performed considering the coupling map of (a) provided in (b), a quantum circuit is constructed, as shown in (c). The CNOT gate, which is indicated by the red dotted line in circuit (b), is applied to the first and third qubits. In the coupling graph of (a), the first and third qubits are not connected to each other. Thus, they cannot be executed directly on the quantum machine. The CNOT gate represented by the blue dotted line is a CNOT gate that can be affected by the SWAP gates added for the CNOT, which are represented by the red dotted line to be executed. When Qubit mapping is performed, two possible results can be observed: (c) and (d). In (c), two swap gates were added, and in (d), only one swap was added. Therefore, for a NISQ-scale quantum circuit, depending on the qubit mapping algorithm, the number of swap gates added, and the execution time of qubit mapping may be affected.

### 2.2 Qubit-mapping Algorithms

Recent studies related to Qubit mapping have been conducted to implement NISQ-scale quantum circuits in real quantum machines. Cowtan et al. [20] performed Qubit mapping using the t|ket> quantum compiler in the architecture of IBM QX5, a real quantum machine with 16 qubits, and IBM Tokyo with 20 qubits. Cowtan et al. [20] aimed to minimize the depth of the added quantum gate and quantum circuit. For performance verification, from a significantly small-scale quantum circuit with five quantum gates to a NISQ-scale quantum circuit as a benchmark, the benchmarks were run together in the compiler systems of Qiskit, Project Q, and Qulic to compare the performance. In the performance comparison, however, it is difficult to accurately compare the performance with t|ket> because the number and depth of quantum gates and depth of Qiskit and Qulic for approximately 3000 or more quantum circuits in the IBM Q Tokyo machine used by Cowtan et al. [20] were not measured. In addition, the execution time required for qubit mapping in the three compilers was not analyzed.

A previous study [21] used a heuristic algorithm using the gate commutation rule and the gate transformation rule to reduce the number of additional gates. In particular, the number of additional gates required for qubit mapping was minimized using the Bridge gate and the SWAP gate. On the other hand, the quantum circuit was used as a benchmark for performance verification contained up to 38577 quantum gates, which is not sufficiently large. Comparison analysis of the circuit execution time was not performed sufficiently. Zulehner et al. [22] performed qubit mapping that minimizes the number of gates added to the IBM QX architecture and reported superior performance compared to the default qubit-mapping algorithm provided by Qiskit at the time the thesis was written. Cowtan et al. [20] developed and verified the superiority of the SABRE algorithm by comparing its performance with that by Zulehner et al. [22] in terms of the number of additional quantum gates, total number of gates, and quantum qubit mapping execution time. On the other hand, the benchmarks used for performance verification included a maximum of 34881 quantum gates, which is significantly smaller than the number of quantum gates on the NISQ-scale to be addressed in this study. Zhang et al. [23] performed qubit mapping to minimize the circuit depth using a depth-aware SWAP insertion scheme. The reported performance of the method [23] was verified by comparing it with the SABRE algorithm and the method reported by Zulehner et al. [22].

Compared to the two methods, the method reported by Zhang et al. [23] showed superior performance in terms of additional gates and depth. On the other hand, the benchmarks used by Zhang et al. [23] are also not NISQ-scale quantum circuits, and a detailed analysis of the execution time was not performed except for the statement that the expected execution time can be equated with the depth. Niu et al. [24] reported the performance of reducing the number of additional gates by 28\% on average compared to SABRE on the ibmq\_almaden machine and 14\% on the ibmq\_tokyo machine by proposing a hardware-aware mapping transition algorithm. On the other hand, because Niu et al. [24] also benchmarked the quantum circuit used in SABRE, the execution time of the NISQ-scale quantum circuit was not compared and analyzed. Therefore, the present study used the NISQ-scale quantum circuit as a benchmark to analyze the execution time, which was insufficient in previous papers, and reduce it compared to the existing methodology. The circuit consists of three sub-processes using the SABRE algorithm, and the execution time of SABRE was compared with the proposed method for performance verification.

## 3. Parallelized Qubit Mapping Algorithm

This section describes the novel Qubit mapping algorithm proposed in this paper in more detail. Fig. 3 illustrates the overall process of the proposed method. It is a parallelization of the existing SABRE algorithm through a multiprocessing module in Python and consists of three-step processes. Each process is Circuit Split, Parallelized SABRE and Reversed SWAP and Circuit Merge.

### 3.1 Proposed Method

First, the quantum circuit was split into several quantum subcircuits. By doing this, each quantum subcircuit could be computed in parallel. Circuit splitting processes on quantum circuit objects of the Qiskit library. In this version, every quantum circuit that contains more than 300 gates must be split. There is a trade-off between the number of gates per quantum subcircuit and the number of core processes used. This trade-off is mentioned in section 4.2.

Second, one quantum circuit was divided into quantum subcircuits with the same number of quantum gates, and the SABRE algorithm was applied to the quantum subcircuits to perform qubit mapping. At this time, because SABRE mapping is performed on a relatively small quantum sub-circuit compared to the original circuit, the size of a Directed Acyclic Graph (DAG) [19] of each quantum sub-circuit is also relatively small. The size of the DAG has a significant impact on the execution time of the qubit mapping because the DAG is checked multiple times during qubit mapping. In addition, because SABRE mapping is applied to each quantum subcircuit in parallel, the execution time of qubit mapping is faster than that of one-by-one. After SABRE mapping, the position of the last logical qubit of each quantum subcircuit is different from the initial position because the SWAP gates are added to each quantum subcircuit. The quantum gate is not applied to the correct qubit if all quantum subcircuits are combined under these conditions. Thus, the measurement result will be incorrect. To address the position changes, the logical qubit position must be moved back to the initial position. To achieve this, when performing SABRE mapping for each quantum subcircuit, the initial logical qubit position is arranged from the first qubit to the last qubit in order. When SABRE mapping is finished, the position of the changed logical qubit is rearranged from the first qubit to the last qubit sequentially. Different permutations are added to each circuit to rearrange the physical qubits. The permutation describes the current position of the logical qubit that allows the addition of reversed swap gates to move the qubit back to the initial position. The permutation pattern is extracted by identifying which qubits are swapped when the SABRE mapping operates. This pattern is different for each quantum subcircuit, and the positions of the last logical qubits of all quantum subcircuits are rearranged sequentially through permutation. In addition, this reversed swap process is also executed in parallel to accelerate the execution time.

Finally, each quantum subcircuit on which SABRE mapping and permutation have been performed must be combined into one quantum circuit again. Since all quantum subcircuits have physical qubits arranged in order by permutation at the end of the circuit, they can be converted quickly into a single quantum circuit with a simple '+' operation without additional work. The final quantum circuit has a different configuration from the circuit mapped by the SABRE algorithm; however, the result after the measurement is the same. Therefore, the accuracy of the quantum circuit mapped through the proposed method is reliable.

### 3.2 Example of Proposed Method

Fig. 4 is an arbitrary quantum circuit with four qubits and a total of 19 quantum gates. Assume that a given quantum circuit is operated in a quantum machine with a coupling map, as shown in Fig. 2(a). First, as shown in Fig. 5, a quantum circuit is divided into three quantum subcircuits with six quantum gates through the circuit split process. Subsequently, the red dotted line of each quantum subcircuit cannot be executed directly. Hence, it becomes a quantum gate that requires an additional swap, and the blue dotted line becomes a quantum gate that can be changed owing to the additional swap.

Fig. 6 presents three quantum subcircuits to which SABRE mapping is applied. Each swap gate is applied at a different location. Hence, the location of the final physical qubit is different from the first location. Therefore, before combining them as one quantum circuit, each permutation is applied in Fig. 7. The square brackets in permutation are the pattern of permutation, and each quantum subcircuit has a different pattern. Consequently, the position of the final qubit becomes the same as the position of the first qubit, which avoids applying a quantum gate to the wrong qubit when merging. Fig. 8 shows that it has become one quantum circuit again after the Circuit Merge process.

## 4. Performance Evaluation

This section reports the performance of the proposed qubit-mapping algorithm by comparing it with the SABRE algorithm. Ten NISQ-scale quantum circuits are used as benchmarks, and the execution time of qubit mapping, the number of added quantum gates, and the circuit depth are analyzed.

### 4.1 Speedup

Fig. 9 shows that the execution time of the proposed method is improved compared to the SABRE execution time. All experiments in this study were executed on a server with an AMD EPYC 7R32 CPU up to 3.4GHz (32 physical cores, two threads per core) and 128GB memory. The operating system was Ubuntu 20.04.2 LTS. The Python and Qiskit versions were 3.8.10 and 0.30.1, respectively. A Python multiprocessing module was used to perform the SABRE algorithm in parallel, and the number of processes depends on the number of CPU cores. Ten NISQ-scale quantum circuits were used as benchmarks, and the number of CPU cores was doubled from two to use up to 32 cores.

The Qiskit simulator was used to verify the accuracy of the experimental results and confirm that the results of the original quantum circuit, the circuit to which SABER was applied, and the circuit to which the proposed method was applied were all the same.

Because the proposed method executes SABRE mapping in parallel, the execution time speedup is improved approximately twofold when the number of cores is doubled. The proposed method in most quantum circuits outperformed SABRE. In the case of $pm\_4096$ with full name $plus63mod4096\_163$, using 32 cores, it was possible to obtain a speedup improvement of 6.4x compared to SABRE. On average, the proposed method obtained 1.5x, 2.5x, 3.5x, 4.5x, and 4.8x faster for 2, 4, 8, 16, and 32 cores, respectively, compared to SABRE.

The proposed method ran the SABER algorithm in parallel while increasing the number of CPU cores from 2 to 32. In some benchmarks, the execution time of the proposed method did not increase linearly when the number of cores was doubled. The proposed algorithm consisted of three parts. The circuit split and circuit merge process take time because only the part that performs qubit mapping was in parallel. Therefore, the execution time of the proposed method did not improve the performance completely linearly according to the number of cores.

Table 1 lists the total number of quantum gates and the circuit depth generated by the SABRE and proposed method. The comparison normalized the results of SABRE and presented the results for the number of SWAP gates added using the proposed method, total number of gates, and depth. Because the proposed method included the permutations, there was a possibility that the number of swap gates to be added would be larger than that of the SABRE algorithm. On average, 1.25x swap gates were added compared to SABRE. On the other hand, in terms of the total gates, compared to SABRE, the maximum value was 1.05x, and the average was 1.03x, indicating a relatively small number compared to the added swap gates.

The depth of a quantum circuit is the total number of layers in which quantum gates can be executed simultaneously [25]. That is, quantum gates included in one layer can be executed in parallel; hence, the depth can significantly affect the overall execution time of the circuit. The depth of the proposed method increased up to 1.06x with an average of 1.03x compared to SABRE.

This study analyzed the methods to improve the execution time performance of the proposed method for future studies. The SABRE algorithm was executed in parallel by dividing one quantum circuit into multiple small-scale quantum subcircuits, doubling from at least two CPU cores to 32 for each quantum subcircuit. The results are presented in Fig. 9. Some specific benchmarks, such as $misex1\_241$, i.e., the number of cores, were doubled, but the speedup improvement was decreased. This is because the quantum subcircuits are assigned randomly to each core without considering the computational load of each quantum subcircuit when executing this method. That is, the load-balancing issue was not considered. Therefore, the consistent speedup improvement of the method according to the number of cores will need to be determined by adding a load-balancing algorithm optimized to the proposed method in future research.

## 5. Conclusion

This paper reported the parallelized qubit mapping algorithm that utilizes SABRE to accelerate the execution time of qubit mapping. The proposed method takes a quantum circuit as an input, and then splits it into multiple quantum subcircuits. Next, it calculates the SABRE mapping to each of the quantum subcircuits simultaneously powered by multiple cores. Finally, it accumulates results from SABRE and merges back into a single quantum circuit. The proposed method was tested based on the number of gates, depth and execution time with 10 NISQ-scale benchmarks. Because this method used additional swap gates to swap back the location of qubits, it has 1.03x more gates than SABRE on average. On the other hand, with the advantage of parallelization, in terms of the execution time, a speedup improvement of up to 6.42x with an average of 4.8x was obtained with 32 cores compared to SABRE. Future work will extend this method to manage the assigned quantum subcircuits to each core more efficiently.

### ACKNOWLEDGMENTS

This work was supported by Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) [No. 2020-0-00014, A Technology Development of Quantum OS for Fault-tolerant Logical Qubit Computing Environment]

### REFERENCES

1
Steane A., Feb. 1998, Quantum computing, Rep. Prog. Phys., Vol. 61, No. 2, pp. 117-173
2
Hey T., Jun. 1999, Quantum computing: An introduction, Comput. Control Eng. J., Vol. 10, No. 3, pp. 105-112
3
(accessed Oct. 18, 2021), D-Wave Systems - The Practical Quantum Computing Company., https://www.dwavesys.com
4
Hu F., et al. , Apr. 2020., Quantum computing cryptography: Finding cryptographic Boolean functions with quantum annealing by a 2000 qubit D-wave quantum computer, Phys. Lett. A, Vol. 384, No. 10, pp. 126214
5
Hu F., Lamata L., Wang C., Chen X., Solano E., Sanz M., May 2020., Quantum Advantage in Cryptography with a Low-Connectivity Quantum Annealer, Phys. Rev. Applied, Vol. 13, No. 5, pp. 054062
6
Preskill J., Aug. 2018, Quantum Computing in the NISQ era and beyond, Quantum, Vol. 2, pp. 79
7
Lidar D. A., Brun T. A., 2013, Quantum Error Correction., Cambridge University Press
8
Devitt S. J., Munro W. J., Nemoto K., Jun. 2013., Quantum error correction for beginners, Rep. Prog. Phys., Vol. 76, No. 7, pp. 076001
9
Steane A. M., May 1999, Efficient fault-tolerant quantum computing, Nature, Vol. 399, No. 6732, pp. 124-126
10
Linke N. M., et al. , Nov. 21, 2016., Fault-tolerant quantum error detection, Sci. Adv., Vol. 3, No. 10, pp. e1701074
11
Chiaverini J., et al. , Dec. 2004, Realization of quantum error correction, Nature, Vol. 432, No. 7017, pp. 602-605
12
(accessed Oct. 18, 2021), Stabilizer Codes and Quantum error correction - ProQuest.
13
Brun T. A., Oct. 2019, Quantum Error Correction
14
Abdessaied N., Wille R., Soeken M., Drechsler R., 2013, Reducing the Depth of Quantum Circuits Using Additional Circuit Lines, in Reversible Computation, vol. 7948, G. W. Dueck and D. M. Miller, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 221-233.
15
Xi Z., Li Y., Fan H., Jun. 2015., Quantum coherence and correlations in quantum system, Sci Rep, Vol. 5, No. 1, pp. 10922
16
(accessed Oct. 18, 2021), Qiskit., https://qiskit.org
17
(accessed Oct. 18, 2021), QuTiP - Quantum Toolbox in Python., https://qutip.org
18
Apr. 20, 2020., New t-ket>TM Release, Cambridge Quantum Computing(accessed Oct. 18, 2021).
19
Li G., Ding Y., Xie Y., Apr. 2019, Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices, in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, Providence RI USA, pp. 1001-1014
20
Cowtan A., Dilkes S., Duncan R., Simmons W., Sivarajah S., On the qubit routing problem, pp. 29
21
Itoko T., Raymond R., Imamichi T., Matsuo A., Jan. 2020, Optimization of quantum circuit mapping using gate transformation and commutation, Integration, Vol. 70, pp. 43-50
22
Zulehner A., Paler A., Wille R., Jul. 2019, An Efficient Methodology for Mapping Quantum Circuits to the IBM QX Architectures, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., Vol. 38, No. 7, pp. 1226-1236
23
Zhang C., Chen Y., Jin Y., Ahn W., Zhang Y., Zhang E. Z., A Depth-Aware Swap Insertion Scheme for the Qubit Mapping Problem, arXiv preprint, pp. 7
24
Niu S., Suau A., Staffelbach G., Todri-Sanial A., 2020, A Hardware-Aware Heuristic for the Qubit Mapping Problem in the NISQ Era, IEEE Trans. Quantum Eng., Vol. 1, pp. 1-14
25
(accessed Oct. 14, 2021), Quantum Circuits (qiskit.circuit) - Qiskit 0.31.0 Documentation.
26
Zhang Y., Deng H., Li Q., Jan. 2020, Context-Sensitive and Duration-Aware Qubit Mapping for Various NISQ Devices, arXiv:2001.06887 [quant-ph]

## Author

##### Dongmin Kim

Dongmin Kim received a Bachelor's degree in Computer Engineering and Applied Mathematics, Pukyong National University, Busan, South Korea, in 2021. He is currently working toward the MS degree in the Department of AI Convergence, Pukyong National University, Busan, South Korea. His research interests include compiler, memory systems, and quantum computing. Particularly, quantum annealing and quantum machine learning.

##### Sengthai Heng

Sengthai Heng received his Bachelor's degree in Information Technology from the University of Cambodia, Phnom Penh, Cambodia, in 2019. He presently is a master's degree student of AI convergence at Pukyong National University (PKNU), Pusan, Republic of Korea. His current research interests include quantum computing, high-performance computing, and AI.

##### Youngsun Han

Youngsun Han received his B.E. and Ph.D. degrees in electrical and computer engineering from Korea University, Seoul, Republic of Korea, in 2003 and 2009, respectively. He was a senior engineer from 2009 to 2011 with System LSI, Samsung Electronics, Suwon, Republic of Korea. From 2011 to 2019, He was an assistant/associate professor with the Department of Electronic Engineering at Kyungil University, Gyeongsan-si, Republic of Korea. Since 2019, he has been an associate professor with the Department of Computer Engineering, Pukyong National University, Pusan, Republic of Korea. His research interests include high-performance computing, emerging memory systems, compiler construction, and SoC design.