In recent years, the advancement of quantum computers has been improving, with its use increasing in practical applications, including noisy intermediate-scale quantum (NISQ) technology. However, an obstacle that limits quantum computer usage is the connectivity restriction between logical and physical qubits. For performing quantum entanglement on NISQ machines, two logical qubits must be mapped to physical qubit pairs that much up to the connectivity constraint. This mapping process must insert additional operations into an initial quantum circuit. Many powerful algorithms have been proposed to solve the mapping problem, which also mitigate circuit size or depth and improve the fidelity of the quantum state. In this review paper, we present various studies for solving qubit mapping or allocation problems and analyze their time complexity and methods.

※ The user interface design of www.jsts.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

### Journal Search

- (Department of AI Convergence, Pukyong National University, Pusan 48513, Korea sengthai@pukyong.ac.kr)
- (AI Department of Computer & Information Technology, Incheon Jaeneung University, Incheon, Korea misoh049@jeiu.ac.kr )
- (Department of Computer Engineering, Pukyong National University, Pusan 48513, Korea youngsun@pknu.ac.kr)

## 1. Introduction

In the past several years, advanced quantum technologies have been explored and
developed rapidly in terms of hardware and software. Many approval quantum algorithms
are solving tasks significantly faster than classical ones, such as the Glover, Deutsch,
and Shor algorithm. In this study, we examine how quantum computing improves other
fields, such as quantum cryptography (which strengthens digital security) ^{[1]}, artificial intelligence (AI) ^{[2]}, and quantum chemistry ^{[3]}. We also explore the early-stage quantum computing of commercial use cases, such
as quantum simulation, quantum-assisted optimization, and quantum sampling.

All of these faster computing fields can empower beneficial new machine learning,
finance, and health care ^{[4]}. Consequently, major industrial and start-up companies are racing to build quantum
computers, including IBM ^{[5]}, Intel ^{[6]}, Rigetti ^{[7]}, IonQ ^{[8]}, Psi-Quantum ^{[9]}, and Google ^{[10]}. We are in a new era where we can run quantum algorithms on a physical quantum machine
rather than a quantum simulator.

However, there is a gap between quantum hardware and quantum algorithms because
of the limited number of qubits, circuit compilation, and noise that can interfere
with the computation. The number of qubits is the key to building a practical quantum
application. For example, a quantum computer would require 100 million qubits to factorize
a 2,000-bit number ^{[4]}. Google plans to reach one million qubits in 10 years ^{[11]}.

Noise and decoherence are significant problems in building a stable and reliable quantum computer, and it also disturbs the fidelity of quantum algorithms. Accordingly, solving this problem tends to require more qubits for error correction. However, having sufficiently powerful quantum hardware alone cannot achieve milestones (quantum supremacy). We require a compiler that can manipulate the physical qubits significantly enough to improve the fidelity and execution time.

Compiling quantum programs for execution on quantum hardware still requires a
classical device to map qubits in logic to physical qubits in devices, which is called
$\textit{qubit allocation}$. The qubit allocation problem is a concern because in
a quantum program, there is a special operation (entanglement state) that acts on
two qubits. Moreover, quantum computers, such as the noisy intermediate scale quantum
(NISQ) type, enable this special operation only if physical qubits are nearby or interact
with each other. Therefore, a quantum program cannot be perfectly executed directly
on an NISQ device. It is impossible to copy the state of qubits due to the no-cloning
theorem ^{[12]}. Furthermore, many solutions can exist with different numbers of gates or final state
fidelities. Mapping logical to physical qubits has been proven as NP-complete problem
^{[13]}.

Many technological methods addressing the allocation problem have been proposed.
In this paper, we examine useful methods in two categories: (1) depth optimal and (2) variation-aware qubit allocation. Zulehner et al. proposed a solution using combinatorial
optimization called the A* algorithm ^{[14]}, which ran faster than the IBM solution available at that time. A year later, the
$\textit{sabre}$ algorithm was introduced by Li et al. ^{[15]} and outperformed the A* algorithm. $\textit{Sabre}$ provides a search algorithm and
design heuristic cost function with look-ahead ability. A good solution may be hidden
behind high barriers that are difficult to overcome, so the HA algorithm fostered
a new optimum cost function for $\textit{sabre}$ that can process 28% more efficiently
than $\textit{sabre}$ in terms of reducing circuit size ^{[16]}.

Simultaneously, Zhu et al. ^{[17]} proposed a dynamic look-ahead mapping and expansion-from-center scheme that helps
to overcome the qubit allocation problem. In a different methodology, Siraichi et
al. used graph-based algorithms such as subgraph isomorphism and token swapping, labeling
it the BMT algorithm ^{[18]}. Optimization algorithms from QURE ^{[19]} help to improve the fidelity of the circuit by finding isomorphic subgraphs or using
QURE’s greedy strategy.

The rest of this paper is organized as follows. In Section 2, we briefly describe quantum computing, and in Section 3, we describe the qubit mapping problem. Section 4 presents depth optimal qubit allocation (DOQA) methods, and variation-aware qubit allocation (VQA) methods are detailed in Section 5. Finally, we discuss the time complexity of all six methods in Section 6.

## 2. Background

### 2.1 Foundation of Quantum Computing

The basic unit of classical computing is a bit, with the concept of switching off and on (zeros and ones). Similarly, quantum computing uses a quantum bit called a qubit. A qubit can be off, on, or both simultaneously, the so-called superposition state. These qubits can be written using sophisticated Dirac notation as $\left.|0\right\rangle $ and $\left.|1\right\rangle $. The qubit state can be described as a linear superposition:

The amplitudes $\alpha $ and $\beta $ are complex numbers, and $\alpha ^{2}+\beta ^{2}=1$. This qubit can be measured by $\left| \alpha \right| ^{2}$ and $\left| \beta \right| ^{2}$. The values $\left.|0\right\rangle $ and $\left.|1\right\rangle $ are the basis states that can also be written as two-dimensional vectors.

A classical computer applies logic gates to bits, including AND, OR, NOT, NOR, NAND, or XOR gates. Likewise, a quantum gate is expressed with a single unitary matrix and applied on a qubit. For instance, a Pauli-X gate or NOT gate (denoted as X in a quantum circuit) rotates the Bloch sphere 180$^{\circ}$ on the x-axis. The Pauli-X gate turns the qubit in the opposite direction: $X\left.|0\right\rangle \rightarrow \left.|1\right\rangle $. The Hadamard-Walsh gate H is a well-known single gate that modifies the basis state to the superposition state.

Quantum gates are reversible, such that $H\otimes H$ = I (identity matrix). For
example, the first two H gates on $q_{3}$ in Fig. 1 can be removed. The CNOT gate is an entangling gate that acts on two different qubits,
CNOT$\left.|ab\right\rangle $. If the control qubit $a$ is $\left.|1\right\rangle
$, the target qubit $b$ will be flipped to the opposite direction when applying the
Pauli-X gate. Hence, if the control qubit is $\left.|0\right\rangle $, the gate has
no actions on target qubit $b$. As shown in Fig. 1, the control and target qubits are connected by a vertical line and denoted as ${\bullet}$
and $\otimes $, respectively. An elementary CNOT gate and a set of single-qubit gate
compositions can be expressed as an arbitrary quantum circuit ^{[20]}.

### 2.2 Quantum Circuit

A quantum circuit is a digital computation board (Fig. 1) that accumulates quantum gates to represent how an algorithm or quantum program is executed. A quantum circuit is executed from left to right like a musical score that plays an algorithm as reading along. In Fig. 1, the square box represents a quantum gate that is interconnected by a quantum wire (horizontal line). Each quantum wire line represents a qubit with an initial state. The vertical line with ${\bullet}$ and $\otimes $ is a CNOT gate.

The two concepts of quantum circuits are the size and depth of the circuit. Quantum
circuit size is the number of elementary gates, and circuit depth refers to the number
of time steps (time complexity) executed from input to output gates ^{[21]}. As shown in Fig. 1, the circuit size is 17, and the circuit depth is 8. The qubits existing in the quantum
circuit are commonly known as logical qubits.

### 2.3 NISQ Hardware

There are various quantum architecture systems, such as NISQ ^{[22]}, fault tolerant (FT) ^{[23]}, and quantum annealing systems ^{[24]}. An NISQ device refers to a quantum computer that does not have enough qubits to
correct the error (noise) but is sufficiently scalable to perform computational tasks
^{[15,}^{22]}. However, it is still limited by the number of applicable qubits and accuracy. Quantum
computers are very constrained by the limited number of qubits available with poor
qubit connectivity and noise problems. These noise problems cause a quantum computer
to malfunction, including gate error, T1 relaxation, and T2 dephasing ^{[19]}. Consequently, if the quantum system can store and process information reliably,
the system must be kept perfectly isolated from the outside world ^{[22]}.

The NISQ machine enables only local communications between qubits that must be
physically connected. The qubits used in hardware are called $\textit{physical qubits}$.
Connectivity, such as CNOT gates between physical qubits, is restricted by the hardware
itself. There are two types of physical qubit connections in NISQ devices: directed/bidirected
coupling graphs ^{[15]} and full connections (all-to-all connection) ^{[25]}. In IBM QX, quantum computer models used to be designed on a directed coupling graph
in which the communication of qubits is expressed by an arrow point. In Fig. 2(a), $Q_{3}$ can be directly connected with $Q_{4}$, but not vice versa. In a recent
IBM computer model, two coupled qubits are connected by a bidirectional arrow, where
all nearby qubits can interact with each other, as shown in Fig. 2(b).

The distance between these qubits’ connectivity and the imperfection of operation
gates also affects the fidelity of quantum states. Fidelity measures two relationships
between two quantum states. If one state is a significant distance away, the fidelity
will decrease. However, higher fidelity means a higher probability of correct output
^{[15,}^{19]}.

## 3. Qubit Mapping Problem

Qubit allocation is a mapping algorithm that allocates a physical qubit in quantum
hardware to a logical qubit of a quantum circuit ^{[13]}. Qubit allocation solves the problem that occurs when two qubits are not directly
connected in the quantum hardware but need to be entangled in a quantum circuit. A
perfect mapping algorithm does not exist for an arbitrary quantum circuit on NISQ
devices because of hardware constraints, although it could improve over time.

Currently, however, the qubit allocation problem has not been unsolvable: the
simple solution is to move the logical qubit or transform the gate by inserting transformation
gates. A transformation gate composed of elementary gates is used to emulate semantic
CNOT relations or switch the physical qubit state. The well-known transformation gates
in qubit mapping solutions are swap, reversal, and bridge gates. These gates have
different behaviors on qubits ^{[13]}.

The swap gate swaps two logical qubits’ states. As shown in Fig. 3(a), a swap gate can be decomposed into three CNOT gates, one of which is in the opposite direction. To make it the same direction, the swap gate can also be decomposed into three CNOT gates and four Hadamard gates. The reversal and bridge gates perform a similar job. The reversal gate in Fig. 3(b) emulates the CNOT ($q_{0},q_{1}$) to CNOT ($q_{1},q_{0}$) with an extra two Hadamard gates. The bridge gate also emulates the CNOT gate that skips a qubit nearby, as shown in Fig. 3(c), which is decomposed into four CNOT gates.

These transformation gates are used to address the qubit allocation problem. Assume the quantum circuit Fig. 4(a) map is directed to the coupling graph in Fig. 2(a). In the initial state, logical qubits $q_{0},~ q_{1},~ q_{2},~ ~ \text{and~ }q_{3}$ are assigned to $Q_{0},Q_{1},Q_{2},\,\,\,\mathrm{and}\,\,Q_{3}$, respectively. In the first gate $g_{1}$, the physical qubits $Q_{0}$ and $Q_{1}$ are in opposite directions, so the reversal gate is applied. The second gate $g_{2}$ is located on $Q_{1}$ and $Q_{4}$, which are not near each other, so a bridge gate is needed to build a connection of $Q_{1}\rightarrow Q_{2}\rightarrow Q_{4}$. Furthermore, for gate $g_{3}$, $Q_{2}$ and $Q_{1}$ are not neighbors. Thus, a swap gate must be inserted before $g_{3}$ to swap logical qubits $q_{0}$ and $q_{1}$. As depicted in Fig. 4(b), there are 12 new gates added to the quantum circuit, and the circuit size is 15.

NISQ qubit allocation algorithms are considered to be in two categories: variation-aware qubit allocation (VQA) policies and depth optimal qubit allocation (DOQA). VQA concerns qubit allocation algorithms that focus on high fidelity. In contrast, DOQA focuses on lower depth. Consequently, VQA will choose higher fidelity even if the circuit is a long path, and DOQA will choose lower depth (short paths) and does not consider higher fidelity.

## 4. Depth Optimal Qubit Allocation

### 4.1 Naive Algorithm

We can solve the qubit allocation problem by identifying a transformation gate (swap gate) that can be inserted before the unresolved gate or by replacing it (with a reversal or bridge gate). However, searching through this space is challenging. One gate-mapping problem must have at least one solution. In a naive algorithm, we verify each logical qubit against all qubits that satisfy the coupling graph. As an example, a quantum circuit is shown in Fig. 1, and a bidirected coupling graph is shown in Fig. 2(b).

We assume that the initialization maps logical qubits $q_{0}$, $q_{1}$, $q_{2}$, and $q_{3}~ $ to physical qubits $Q_{0}$, $Q_{1}$, $Q_{2}$, and $Q_{3}$, respectively. In the physical qubits, the corresponding CNOT ($q_{1},q_{2}$) gates are not nearby, so a swap operation is needed. There are many possible solutions, such as moving $q_{1}$ close to $q_{2}$, moving $q_{2}$ to $q_{1}$, and moving them both to be close to each other. In the coupling graph in Fig. 2(b), we can only move $q_{1}$ to $q_{2}$ or $q_{2}$ to $q_{1}$. Moving from $q_{2}$ to $q_{1}$ has two solutions: SWAP ($q_{2},q_{3}$) and SWAP ($q_{2},q_{0}$). Moving from $q_{1}$ to $q_{2}$ also has two solutions: SWAP ($q_{1},q_{0}$) and SWAP ($q_{1},q_{3}$). Therefore, the solution of this qubit mapping problem for CNOT ($q_{1}$,$q_{2}$) has four swap candidates.

Without any optimization, we just select one swap gate among them. This swap gate will affect the compilation of the subsequent gates, so we must know how much this one swap gate can increase or decrease the number of gates in the circuit. Moreover, fidelity and execution time are concerns. The process of searching for the optimal swap can require exponential time. In pursuit of the best circuit depth or size, many researchers have applied heuristic search algorithms from graph theory to qubit mapping problems.

### 4.2 Optimization

The well-known solution for qubit allocation is referred to as $\textit{jku}$,
which was presented by Zulehner et al. ^{[14]}. The $\textit{jku}$ algorithm first partitions the circuit into layers. Each layer
$l_{i}$ contains only gates that do not overlap with another gate qubit. In Fig. 5(a), which is generated from Fig. 1, gates $g_{1}$ and $g_{3}$ overlap on qubit $q_{1}$, so the first layer obtains only
gates $g_{0}$ and $g_{1}$. After portioning, if logical qubits of the gate do not
satisfy the physical coupling graph, $\textit{jku}$ finds and inserts additional swap
gates before layer $l_{i}$. The sequence of swap gates is referred to as permutation
layer $\pi _{i}$.

Instead of initializing mapping, $\textit{jku}$ will choose a physical qubit with the lowest cost for a qubit of a gate that has not been mapped yet. $\textit{jku}$ explores through space and attempts to minimize the number of swap gates in $\pi _{i}$. In the worst case, the time complexity is $\mathrm{\mathcal{O}}\left(m!/\left(m-n\right)!\right)$, where $m$ and $n$ are the numbers of physical and logical qubits, respectively.

Because of time complexity, the A* algorithm is applied to prevent searching of the entire search space. In each step, the A* algorithm chooses the swap based on a value of $f\left(x\right)$. $f$ is a value from the sum of two parameters $g\left(x\right)$ and $h\left(x\right)$, while $g$ is the number of swaps that have been added, and $h$ is the heuristic cost. $h$ is determined by the fidelity cost and distance of a physical qubit in the coupling graph with respect to the logical qubit mapping. For optimization, a look-ahead scheme incorporates information of the next layer into the cost function $f\left(x\right)$ of the A* algorithm. This means that it adds a term that estimates the minimal cost of depth for swap gates added to the cost of the A* algorithm.

The $\textbf{$\textit{sabre}$}$ algorithm ^{[15]} is a promising algorithm for qubit mapping with $\mathrm{\mathcal{O}}\left(N^{2.5}g\right)$
time complexity. In preprocessing, $\textit{sabre}$ uses the All-Pairs Shortest Path
(APSP) algorithm to obtain the distance of all pairs in a coupling graph and represents
it as a 2D-array matrix. Next, it represents a two-qubit gate of the quantum circuit
using a Directed Acyclic Graph (DAG).

As shown in Fig. 5(b), the two-qubit (CNOT) gates in each layer act on dependent logical qubits using the same approach as $\textit{jku}$’s partitioning. The two-qubit gates that do not have predecessors are stored in front layers $F$. In a swap-based heuristic search process, $\textit{sabre}$ checks emitting the successor of the gates in $F$. If these gates can be executed on hardware (resolvable gates), then they will be moved to executable gate lists that contain only the gates that can compile on a coupling graph.

Next, $\textit{sabre}$ searches for swap-qubit-mapping solutions and chooses the optimal one based on the heuristic cost function $H$ of each solution. $H$ in $\textit{sabre}$ is designed to select a solution with three considerations: the front layer, extent layers, and the decay effect. For consideration of the front layer, $\textit{sabre}$ computes the average distance of these physical qubits using the Nearest Neighbor Cost Function (NNC), as shown in Eq. (2), where $D$ is a function of NNC and computes the cost of gates applied to $q_{1}$ and $q_{2}$.

##### (2)

$H_{F}=\frac{1}{\left| F\right| }\sum _{gate\in F}D\left[\pi \left(gate,q_{1}\right)\right]\left[\pi \left(gate,q_{2}\right)\right]$To determine how this impacts the next layer, $\textit{sabre}$ computes $H_{E}$ of the extent layer $E_{x}$ that contains the closest gate to the front layer. This computation is the same as $H_{F}$ except that $gate\in E_{x}$. Another consideration of $\textit{sabre}$ is the $\textit{decay effect}$ that is introduced to choose non-overlapping swaps and increase parallelism. If qubit $q_{i}$ has been involved in a swap recently, this decay parameter increases. The cost function $H$ of $\textit{sabre}$ is shown in Eq. (3), where $W$ indicates the extent to which layers are considered, and its value can be tuned from 0 to 1. \uppercase{t}his $H$ can be improved using different methodologies.

##### (3)

$H=\max \left(swap\left(q_{1}\right),swap\left(q_{2}\right)\right)\times \left(H_{F}+W\times H_{E}\right)$
For the initialization of qubit mapping, $\textit{sabre}$ performs random initial
mapping for the first time and then computes the reverse traversal technique after
all the gates are solved. This technique uses the final mapping of the original circuit
as the initial mapping to compute the reverse circuit, and the final mapping of the
reverse circuit is used as the initial mapping for the original circuit. The reverse
circuit reforms the gates of the original circuit from the last gate $g_{i}$ to $g_{0}$.
The initialized mapping can impact the quality of the entire quantum circuit ^{[15]}. Thus, the reverse traversal technique can significantly increase the initial mapping
quality ^{[15,}^{16]}.

The Bounded Mapping Tree ($\textbf{BMT}$) algorithm ^{[18]} is also designed based on graph theory algorithms. $\textit{BMT}$ employs subgraph
isomorphism and a token swapping algorithm. It uses the machine architecture (coupling
graph) and input program (quantum circuit) as inputs. In the program-partitioning
phase, $\textit{BMT}$ performs subgraph isomorphism to find a viable mapping for each
instruction (CNOT gates) to the coupling graph to obtain maximum-isomorphism sub-lists.

Fig. 6 shows that CNOT ($q_{1},q_{0}$) is mapped to a coupling graph with three possible mappings to physical qubits: ($Q_{0},Q_{1}$), ($Q_{0},Q_{3}$), and ($Q_{0},Q_{2}$). If ($q_{1},q_{0}$) is mapped to ($Q_{0},Q_{3}$), $Q_{3}$ is called a dead-end, which does not have another connection. However, if it is mapped to ($Q_{0},Q_{1}$), $Q_{0}$ is still ``alive,'' which indicates we can map another next logical qubit to a neighbor of $Q_{0}$, such as $Q_{2}$ and $Q_{3}$.

For the next instruction ($q_{2},q_{0}$), $q_{0}$ has been mapped already, so there is only $q_{2}$ left. If $(Q_{0},Q_{1})$ has been chosen previously, then there are two possible mappings of $q_{2}$: $Q_{2}$ or $Q_{3}$. We perform this mapping until all instructions are mapped. As a result of the program partitioning process, the dotted box in Fig. 6 indicates the sub-list with maximum subgraph isomorphism as a partition. More precisely, a logical qubit is never mapped to a different physical qubit in the same partition. Note that finding the maximum subgraph isomorphism requires exponential time, so $\textit{BMT}$ constrains the time by manually limiting the quantity of finding the mapping for each qubit. Thus, if the time for finding it is greater than the limitation goal, a random mapping is adopted.

In the combing mapping step, token swapping ^{[26]} is performed to find the optimal swap ($\pi _{i}$) to connect each partition together
$\left(f_{prev}\rightarrow f\right)$. As shown in Fig. 7, the token swapping processes from $f_{prev}$ by swapping $\pi _{i}$ to reach $f$.
The cost of the token swapping algorithm is $\mathrm{\mathcal{O}}\left(\left| Q\right|
^{3}\right)$, where $Q$ is the number of physical qubits. The last phase is code generation
that will be assembling the partitions with swap. This phase is also an alternative
way to find a swap by replacing $\pi _{i}$ using the four-approximative $\Delta $
algorithm to solve the colored token swapping problem because $\Delta $ may produce
a lower cost for swap. Then, $\textit{BMT}$ will choose the least expensive solution
between the first solution and $\Delta $.

## 5. Variation-aware Qubit Allocation

### 5.1 Greedy Approach

Qubit re-allocation (QURE) ^{[19]} is an algorithm that is performed after qubit mapping to improve the fidelity. QURE
is applied to quantum circuits that can be executed directly on target quantum hardware
after applying any DOQA algorithm. $\textit{QURE}$ takes the quantum circuit from
DOQA as an input and provides two algorithms: a greedy approach and a subgraph search.

$\textit{QURE}$’s greedy approach ranks all physical qubits in a coupling graph in descending order based on the average two-qubit gate fidelities of individual qubits. Furthermore, $\textit{QURE}$ ranks the physical qubits by the highest-fidelity score fetched from the quantum hardware. As shown in Fig. 9, it is assumed that a physical qubits are ranked as follows: $Q_{5}$, $Q_{0}$, $Q_{3}$, $Q_{7}$, $Q_{8}$, $Q_{4}$, $Q_{1}$, $Q_{6}$, and $Q_{2}$ ($Q_{5}$ is the optimal qubit with highest fidelity). In logical qubits, it ranks all qubits in descending order based on how many times the two-qubit gate is applied to that qubit. Thus, the ranking list is $q_{2}$, $q_{1}$, $q_{3}$, and $q_{0}$. After ranking the best qubits, $\textit{QURE}$ starts to select the best physical qubit ($Q_{5}$) and logical qubit ($q_{2}$) and then assigns (maps) $q_{2}$${\rightarrow}$$Q_{5}$.

Next, $\textit{QURE}$ looks around at the neighbors of $Q_{5}$ ($Q_{8}$, $Q_{4}$, and $Q_{2}$) and selects the optimal one among them. The same occurs for the logical qubits, among which the neighbors of $q_{2}$ are $q_{1}$ and $q_{0}$, and then $q_{1}$ is selected as an optimal one. In this step, $q_{1}$ is mapped to $Q_{5}$. The selected logical and physical qubits will not be chosen anymore. $\textit{QURE}$ continues looking and assigning until all logical qubits are mapped. Consequently, $\textit{QURE}$ obtains the optimal qubit mapping, as shown in Fig. 9.

Another method from $\textit{QURE}$ is subgraph search, which finds a quantum circuit’s isomorphic subgraph from the coupling graph. This method also performs after logical qubits of the circuit are allocated, but it will be re-allocated to increase fidelity. Accordingly, $\textit{QURE}$ extracts an arbitrary subgraph from the coupling graph (physical qubits) for logical qubits to map, as indicated by the dotted box in Fig. 9. Next, $\textit{QURE}$ computes this mapping with a physical qubit to obtain a fidelity score based on the gate error of each qubit.

As shown in Fig. 9, if the dotted box moves to the left, there is another possible mapping. Moreover, the other two possible mappings are in the dotted box rotated by 90$^{\circ}$. There are four possible mappings in Fig. 9. However, each mapping, such as the vertical ($V$) mapping, can be computed by flipping horizontal ($V_{2}$), flipping vertical ($V_{4}$), and rotating 180$^{\circ}$ ($V_{3}$).

The same occurs for horizontal mapping. More formally, all possible mapping $A$ can be formulated as in Eq. (4). $Q_{v}$ and $q_{v}$ are many physical and logical qubits for vertical mapping, respectively. $Q_{H}$ and $q_{H}$ are for horizontal mapping. Therefore, Fig. 9 shows that there will be $4\times \left(16-2+1\right)\times \left(16-2+1\right)$ or 900 possible mappings. $\textit{QURE}$’s subgraph search will compute all these possible mappings and then select the optimal mapping (high fidelity).

### 5.2 Optimization

A hardware-aware heuristic ($\textit{HA}$) mapping algorithm ^{[16]} inspired by $\textit{sabre}$ efficiently reduces additional gates by 28% and improves
fidelity as well. $\textit{HA}$ performs both swap and bridge gates for mapping. Like
$\textit{sabre}$, $\textit{HA}$ also constructs a quantum circuit to DAG and layers,
as shown in Fig. 5. $\textit{HA}$ follows the $\textit{sabre}$ heuristic search algorithm but simply
modifies the cost function $H$.

$\textit{HA}$ introduces three matrices to estimate the cost of a swap: a swap matrix (number of swaps) $S$, a swap error matrix $E$, and a swap execution time matrix $T$. Along with these three matrices, there are three weights ($\alpha _{1},\alpha _{2},\alpha _{3}$) that can be tuned. More precisely, $S$, $E$, and $T$ are computed using the Floyd-Warshall algorithm that finds the shortest paths of all pairs in a directed graph. Computing the Floyd-Warshall algorithm requires using other weights for each matrix ($W_{E}$,$W_{T}$,$W_{S}$). $W_{E}$ is computed using $e$, an error rate given in the calibration data.

For example, $e\left(Q_{0},Q_{1}\right)=5$ indicates that the minimum error rate between the logical qubits corresponding to $Q_{0}$ moving to $Q_{1}$ is 5. As shown in equation (5), $e_{ij}$ and $e_{ji}$ refer to the error rates of $Q_{i},Q_{j}$ and $Q_{j},Q_{i}$ , respectively.

The execution time cost $W_{T}$ based on the $t$ which is also extracted from hardware and indicates the execution time between $Q_{i}$ to $Q_{j}$ and vice versa.

The last width is $W_{S}$. It is calculated by 1 cost for 1 Swap. For instance, it requires a minimum of 1 swap gate to move the state of $Q_{0}$ to $Q_{3}$ in Fig. 2(b). After computing the Floyd-Warshall algorithm, $\textit{HA}$ calculates the sum of three matrices $E$, $S$, and $T$, which is referred to as the $\textit{distance matrix}$ $D$ in Eq. (7).

Another component is used to find the optimal solution between the bridge and swap gates. Thus, to choose the optimal solution, $\textit{HA}$ computes the $\textit{Effect}$ function, as shown in Eq. (8), where the distance matrix of a bridge ($D_{$\textit{bridge}$}$) is subtracted from $D_{swap}$.

Recently, Zhu et al. introduced two core algorithms: an expansion-from-center
scheme ($\textbf{EFC}$) and the maximum consecutive positive effect ($\textbf{MCPE}$)
of a swap gate as the heuristic cost function ^{[27]}. The $\textit{EFC}$ scheme is used in the initial mapping process. In the first $\textit{EFC}$
stage, the gate index is labeled at the edge of the logical qubit graph depending
on the first applied gate.

As illustrated in Fig. 10, the gate $g_{0}$ firstly acts on $q_{0}$ and $q_{2}$, so these qubits’ edge is the index of $g_{0}$ or 0. These labels are used as weights in the breadth-first search to arrange all logical qubits in a queue starting from the center logical qubit in the graph topology. This center logical qubit is then mapped to the center physical one, and each logical qubit in the queue is mapped to the physical qubits one by one with the closest physical qubit.

In preparatory work for the swap-based heuristic algorithm, the dependency list is generated by counting the dependency gates of each logical qubit, the so-called DAG. Next, the cost distance matrix is computed as a 2D array like $W_{S}$ in $\textit{HA}$ (the number of Swaps from $Q_{i}$ to $Q_{j}$). The core algorithm is used to compute the difference of distance of a gate applied to ($Q_{i},~ Q_{j}$) before and after applying swaps, the so-called $\textit{Effect}$ in Eq. (9). The $dist$ function is obtained from the cost distance matrix.

Assume that the circuit in Fig. 5(a) is mapped to the coupling graph in Fig. 2(b) by initializing mapping from qubit index 0 to 5. For gate $g_{1}$, the distance neighbor from $q_{1}$ to $q_{2}$ in the coupling graph is 1, but after applying SWAP ($q_{0},q_{1}$), the distance is 0. Thus, the effect of this swap operation is 1 minus 0 or 1:

The $\textit{mcpe}$ function computes the sum of the cost effect of every gate on qubits to which the swap operation is applied until $\textit{effect}(g_{i}$) is negative. The range from the first gate to the gate with a negative effect is called the $\textit{window}$. For instance, for the $\textit{mcpe}$ of SWAP($q_{2},q_{0}$) on $q_{2}$, there are three gates: $g_{1}$, $g_{4}$, and $g_{5}$. The effect on $g_{1}$ is 1 (positive), and that on $g_{4}$ is -1 (negative). Thus, in $\textit{window}$, the list contains only $g_{1}$. If there is no positive effective gate, the value of $\textit{mcpe}$ is 0 by default.

A dynamic look-ahead algorithm is introduced as the $\textit{MCPE}$ cost function. $\textit{MCPE}$ illustrates how much SWAP ($Q_{i}$, $Q_{j}$) can affect all the gates on $Q_{i}$ and $Q_{j}$.

As an example, we consider the same previous assumption of using the heuristic search algorithm to solve gate $g_{1}$. There are four solutions: SWAP($q_{0},q_{1}$), SWAP($q_{2},q_{0}$), SWAP($q_{2},$ $q_{3}$), and SWAP($q_{1}$,$q_{3}$). First, we consider the cost function MCPE of SWAP($q_{0},q_{1}$), which corresponds to $Q_{0}$ and $Q_{1}$.

For $mcpe$(SWAP,$q_{0}$), two gates are applied to $q_{0}$: gates $g_{3}$ and $g_{8}$. The nearest-distance qubit in $g_{3}$ before swapping is 0 (nearby), and after applying SWAP($q_{0}$, $q_{1}$), it is 0, so the effect on $g_{3}$ is still 0 in Eq. (9). The distance of $g_{8}$ is 1, and after swapping, it is 0; thus, the effect cost is 1. Therefore, $\textit{mcpe}$(SWAP, $q_{0}$) is 1 in Eq. (10). For $mcpe$(SWAP,$q_{1}$), the cost effect is 2 with three CNOT gates ($g_{1},g_{3},g_{5}$). Therefore, $\textit{MCPE}$(SWAP ($q_{0},q_{1}$)) is 2 + 1 or 3. Both $\textit{MCPEs}$ of SWAP($q_{2},q_{0}$) and SWAP($q_{1},q_{3}$) are 1, and that of SWAP($q_{2},q_{3}$) is 2. Consequently, SWAP ($q_{0},q_{1}$) is selected to be inserted before $g_{1}$.

##### Table 1. Summary of qubit allocation algorithms.

## 6. Performance Evaluation

Table 1 summarizes six qubit allocation algorithms. The first well-known algorithm $\textit{jku}$
^{[14]} divides gates into independent layers and searches all possible solutions to minimize
the sum of the distance between the physical qubits in the layer. It then applies
the A* algorithm to minimize the cost of the depth circuit. The $\textit{jku}$ method
reduces the number of gates by 23% compared with IBM’s approach targeting on IBM QX3
with 16 qubits; however, the time complexity is still exponential. $\textit{Sabre}$
^{[15]} is an efficient method that reduces additional gates by 97% on average for small
circuits and 10% for large circuits compared with $\textit{jku}$. The $\textit{sabre}$
approach computes a swap-based bidirectional heuristic search with a cost function
that ensures flexibility, controllability, scalability, and high-quality initial mapping.

The $\textit{BMT}$ algorithm ^{[17]} exploits subgraph isomorphism for partitioning the circuit and substituting token
swapping with the four-approximate algorithm to find the optimal swap. $\textit{BMT}$
outperforms $\textit{sabre}$ by 16% in decreasing the number of gates and by 35% in
reducing the depth of the circuit. For increasing the fidelity of the final state,
$\textit{QURE}$ ^{[19]} performs reallocation on the top of the compiler when the quantum circuit satisfies
the coupling constraint. $\textit{QURE}$ improves the fidelity of the final state
by 1.54x times over the baseline from Revlib (benchmark resource).

$\textit{MCPE/EFC}$ ^{[27]} uses the $\textit{EFC}$ scheme to determine the initial mapping and compute the look-ahead
cost function that dynamically estimates cost to update the mapping. $\textit{MCPE/EFC}$
outperforms $\textit{Sabre}$ by 22.92% in reducing the number of gates evaluated for
the IBM Q20 Tokyo. It can also be applied to other bidirectional connectivity constraints
of the quantum architecture.

$\textit{HA}$ qubit mapping ^{[16]} inherits the same mapping time complexity of $\mathrm{\mathcal{O}}\left(N^{2.5}g\right)$
from sabre. $\textit{HA}$ is designed as a quantum mapping transition method using
calibration data from a quantum architecture and selects either a swap or bridge gate
for qubit movement. $\textit{HA}$ significantly outperforms $\textit{sabre}$ for IBM
Q Valencia by 8% in fidelity improvement, 19% in reducing execution time, and 38%
in reducing the number of additional gates. For the evaluation in IBM Q Almaden, $\textit{HA}$
decreased the number of gates by 28%.

## 7. Conclusion

The arena of quantum computing will soon reach supremacy by the rapidly increasing use of quantum algorithm applications. There is a gap between quantum circuit execution and imperfect NISQ hardware. In this review paper, we presented six algorithms that try to solve the qubit allocation problem caused by physical qubit hardware constraints. The algorithms can be briefly summarized as follows: $\textit{jku}$ uses the A* algorithm, $\textit{sabre}$ uses a cost function with a distance qubit matrix and reverse traversal initialization mapping, $\textit{BMT}$ solves the NP-complete problem using NP-solutions such as subgraph isomorphism and the token swapping algorithm, and $\textit{MCPE/EFC}$ determines efficient initial mapping by $\textit{EFC}$. $\textit{HA}$ uses swap and bridge gates to satisfy the qubit constraint. Both $\textit{MCPE/EFC}$ and $\textit{HA}$ designed a new cost function from $\textit{sabre}$ that can improve both fidelity and reduce the number of gates. All algorithms were evaluated on NISQ devices with various types of quantum architectures.

### REFERENCES

## Author

Sengthai Heng received his Bachelor's degree in Information Technology from the University of Cambodia, Phnom Penh, Cambodia, in 2019. He presently is a master's degree student of AI convergence at Pukyong National University (PKNU), Pusan, South Korea. His current research interests include quantum computing, high-performance computing, and AI.

Taekyung Kim received his ME and PhD from the Department of Information Industry Engineering, Chungbuk National University, Cheongju, South Korea, in 2005 and 2010, respectively. He was a director at the Korea Software HRD Center, Phnom Penh, Cambodia, from 2013 to 2019. He is currently an assistant professor in the AI Department of Computer & Information Technology, Incheon Jaeneung University, South Korea. His research interests include big data, AI, and SW education.

Youngsun Han received the B.S. and Ph.D. degrees in electrical engineering from the Korea University, Seoul, South Korea, in 2003 and 2009, respectively. He was a senior engineer at the System LSI, Samsung Electronics, Suwon, South Korea, from 2009 to 2011. He was an assistant/associate professor with the Department of Electronic Engineering, Kyungil University, Gyeongsan-si, South Korea, from 2011 to 2019. He is currently an associate professor with the Department of Computer Engineering, Pukyong National University, Busan, South Korea. His research interests include microarchitecture, high-performance computing, and SoC design.