1. Introduction
As the demand for mobile electronic and Internet of Thing (IoT) devices has rapidly
grown, energy efficiency and cost become the key constraints in designing these devices
under the limited battery and power resources. Notably, the security of data transmission
among these devices has become more important as their transmission channel is vulnerable
to security attacks. Hence, cryptographic processing is required for these devices
to protect data for secure communication. Particularly, the low-cost and energy-efficient
cryptographic hardware accelerator is indispensable to efficiently protect and deliver
the transmitted data in the IoT devices. In these accelerators, one of the following
two types of cryptographic algorithms is used for data encryption: symmetric and asymmetric
ciphers. The symmetric cipher, such as Data Encryption Standard (DES) and Advanced
Encryption Standard (AES), uses an identical secret key for encryption and decryption.
In contrast, the asymmetric one uses two different secret keys for encryption and
decryption. The Diffie-Hellman, elliptic curve cryptography (ECC), El Gamal, and RSA
are examples of the asymmetric cipher. Between the two, the symmetric cipher is more
suitable for processing large amounts of data as it is faster and more cost-effective.
Among symmetric ciphers, the AES, a block cipher developed by the National Institute
of Standards and Technology (NIST) in 2001 to replace the DES, is one of the most
widely used encryption algorithms in various security systems due to its efficiency
and simplicity [1]. While the earlier block ciphers, such as DES, blowfish, and SEED, employ the Feistel
network structure in which encryption consists of the repetition of a specific calculation
function (i.e., round function) in each round, the AES adopts its own network structure.
The overall structure of the AES encryption process is shown in Fig. 1. The AES takes a 128-bit plain text and produces the same-sized cipher text using
a secret key. The AES includes three modes: AES-128, AES-192, and AES-256, which correspond
to a key length of 128-, 192-, 256-bit, respectively. The number of rounds (N) in
the AES is determined by the key length: 10 rounds for a 128-bit key, 12 rounds for
a 192-bit key, and 14 rounds for a 256-bit key. Before the first round, a single transformation
of the AddRoundkey is performed, followed by N – 1 rounds of four transformations
of SubByte, ShiftRows, MixColumns, and AddRoundkey. The final round excludes MixColumns.
The decryption process is the inverse of encryption. The decryption includes the inverse
of the encryption transformation except for AddRoundKey, and thus each round has InvShiftRows,
InvMixColumn, InvSubByte, and AddRoundKey. Also, the AES requires a key scheduler
to generate a round key for each round. The input key is expanded into multiple 128-bit
sub-keys through four word-based functions of the key scheduler: RotWord, SubWord,
Rcon, and AddWord. Each word consists of four bytes, and the key scheduler processes
44 words for the 128-bit key, 52 words for the 192-bit key, and 60 words for the 256-bit
key
The AES’s key scheduler, encryption, and decryption include the substitution-box (S-box)
or inverse S-box in SubWord, SubByte, and InvSubByte, respectively. Each byte of the
corresponding input is mapped into a new byte value by a non-linear function. Since
the S-box and inverse S-box define a 16${\times}$16 matrix of byte values and can
be implemented using a look-up table (LUT) or a complicated combinational logic, they
occupy a considerable portion of AES hardware. According to our preliminary analysis
on AES hardware, which is shown in Fig. 2, the S-box (i.e., SubByte) and inverse S-box (i.e., InvSubByte) occupy a significant
portion of the total area and power. Specifically, they consume 75% and 69% of the
area, and 50% and 32% of the power in the AES-128 encryption and decryption hardware,
respectively. The InvMixColumn takes more power than the InvSubBytes as it requires
more complex polynomial multiplication than the MixColumn of encryption. The Galois
Field GF(2$^{8}$) inverse operation needs to be included if not expressed as LUT,
and the field elements of GF(2$^{8}$) are mapped to an isomorphic composite field.
Implementing the composite field of the AES algorithm may not be efficient in hardware.
Therefore, it is meaningful to implement an AES without using LUTs for S-box and inverse
S-box and design new light-weight S-box and inverse S-box, and adopt them in the AES
hardware design to improve the overall hardware efficiency.
To this end, in this paper, we propose a novel light-weight AES architecture that
replaces the S-box, inverse S-box, and key scheduler with a linear feedback shift
register (LFSR)-based counterparts for IoT applications. Our preliminary work of the
LFSR-based AES key scheduler for encryption and decryption was presented in [2]. In this work, we extend our LFSR scheme to other AES encryption and decryption modules
to significantly reduce the hardware resource consumption of the entire AES design.
The proposed AES architecture consists of multiple LFSRs with XOR gates to produce
substitution values and round keys. Our designs, implemented in a 32-nm CMOS technology,
achieve area and energy reductions of up to 57.4% and 75% in the AES-128, 52.2% and
31.7% in the AES-192, and 54.8% and 27.4% in the AES-256, respectively, compared to
the conventional S-box based counterpart.
Fig. 1. AES encryption process.
Fig. 2. Breakdowns of AES-128 encryption and decryption hardware: (a) area; (b) power.
2. Related Works
The AES can be implemented in both software and hardware. Previous research on the
AES software had focused on high throughput and improving the quality of encryption
and decryption [3-6]. Also, the earlier AES hardware aimed for low-latency and high data performance using
pipelined or parallel architectures [7-17].
Kumar et al. [3] increased the number of encryption and decryption rounds of the AES to 16 to improve
data security in software AES implementations. Increasing the number of rounds in
the AES takes more computation time but increases the system’s security by making
it more difficult for attackers to break. In [4], the AES electronic codebook (ECB) encryption with three different graphic processing
unit (GPU) architectures (i.e., Kepler, Maxwell, and Pascal) was implemented. Eight
parallel AES architectures, focusing on different parallel granularities and thread
block sizes, were introduced. Searching in terms of workload distribution over threads
and thread blocks provided high performance, and it was especially effective in improving
the workload per thread using the encryption AES block. Gilmore et al. [5] used neural networks to identify mask values and the secret key with a single attack
trace. Hajihassani et al. [6] proposed a novel AES implementation using a bit-sliced approach. In this work, the
existing row-first form of the input data was replaced with a column-first representation.
Through a parallelization unit, each GPU thread simultaneously processed a number
of thirty-row 128-bit data improving the performance and performance per cost.
Generally, hardware can be implemented in Application Specific Integrated Circuit
(ASIC) or Field Programmable Gate Array (FPGA), and so does the AES hardware [7]. The ASIC is an integrated circuit (IC) designed for a specific application. Compared
to the FPGA, it requires less power, exhibits faster speed, and is suitable for mass
production. However, it lacks the flexibility needed. On the other hand, the FPGA
has flexibility, reconfigurability, and small-volume properties. Earlier studies presented
AES implementations using a pipelined structure, as shown in Fig. 3, to focus on low-latency and high data throughput. Mestiri et al. [8], to prevent fault injection attacks, divided the AES into two parts and inserted
a pipeline between them. The S-box and inverse S-box were independently designed so
that they could be used in logic gates based on Galois Fields and LUTs. Their design
improved the area overhead, frequency, and throughput compared to the existing works.
In [9], a false key-based AES design to withstand a correlation power analysis (CPA) attack
was presented. The wave dynamic differential logic (WDDL)-based XOR gates hid the
data correlated with the false key. Compared to the unprotected AES scheme, the false
key and WDDL assisted AES architecture reduced the performance overhead, power, and
area. Two AES encryption algorithms, adopting iterative looping and pipelining, were
presented in [10]. The partial loop release approach, which used iterations and multistage pipelining,
optimized area, throughput, and dynamic power consumption using only AES-128. In [11], a hardware approach for the AES encryption and decryption suitable for cipher block-chaining
(CBC) mode was proposed. The inverting circuit for the SubByte and InvSubByte was
integrated without any delay overhead. The critical path delay considerably decreased
compared to the previous one, and the proposed approach was effective in terms of
throughput per area due to the new operation-reordering and register-retiming technology.
A new S-box and key generation to enhance the security feature of the AES algorithm
was presented in [12]. The pseudo-noise (PN) sequence generator generated the S-box value and the initial
key required for encryption and decryption. When using the Strict Avalanche Criterion
for 2048, the avalanche effect showed better performance than the traditional AES
design. Compared with the non-pipelined and pipelined models, this approach achieved
higher throughput and smaller area. Three high-throughput AES implementations in EBC
mode and one ultra-high-throughput AES implementation in CTR mode were proposed in
[13]. The designs used GF(2$^{8}$) to create an area-delay efficient multiplier that was
applied to the AES. To achieve high throughput, loop-unrolling, fully pipelining,
and sub pipelining techniques were used by inserting registers in appropriate positions.
Srinivas et al. [14] proposed the AES encryption and decryption adopting pre-computed LUTs. Instead of
using the conventional GF(2$^{8}$) calculation formula, the MixColumns and InvMixColumns
functions in the AES rounds were implemented using LUTs. The proposed design showed
good performance in latency, throughput, area, and power. Liu et al. [15] developed an FPGA-based efficient pipelining AES structure for high-speed protection.
The new key expansion approach increased the key complexity by up to 2N-1 for an N-round
AES, which improved the throughput and security of the AES algorithm. In [16], an S-box based on the multiplexer LUT (MLUT), instead of the existing S-box using
the inverse transform of multiplication of GF(2$^{8}$) and the affine transform, was
proposed. The MLUT was based on a 256-byte to 1-byte multiplexer with 256-byte memory,
so the circuit was simple, and power consumption was low. Since the variance of power
dissipation for different processing data was small, the proposed S-box was also secure
from the CPA-based side-channel attack (SCA). Zhang et al. [17] implemented SubByte and InvSubByte transformations using only combinational logic
instead of S-box based on LUT. In this work, the unbreakable delay caused by the LUT
was eliminated, and the benefits of sub-pipelining were exploited. In addition, they
use composite field arithmetic to reduce hardware complexity and present a key expansion
architecture suitable for sub-pipelined round units. The proposed design was faster
and achieved higher throughput per slide than the conventional one.
Fig. 3. Pipelined AES-128 architecture.
3. The Proposed AES Architecture
In the AES hardware design, our key contribution is in replacing the conventional
S-box and inverse S-box with new low-cost counterparts to reduce hardware resource
consumption greatly. Particularly, we use the LFSR to implement SubByte and InvSubByte
for the encryption and decryption of the AES round and the entire key scheduler of
the round shown in Fig. 4 (see gray boxes). First, Section 3.1 explains the basics of the LFSR and briefly
introduces the two LFSR configurations implemented in the proposed architecture. Then,
the proposed AES design that substitutes the S-box, inverse S-box, and key scheduler
with the LFSRs is presented in Section 3.2.
Fig. 4. Block diagram of AES round.
3.1 Linear Feedback Shift Register
The LFSR is a shift register whose next input bits are calculated as a linear function
of its current state value. The initial value of the LFSR is called a seed, and the
bit that affects the next state of the LFSR is called a tap. The value of an n-bit
LFSR has a specific period (i.e., length of the loop) of 2$^{\mathrm{n}}$-1 as the
number of register values represented by the LFSR is finite and can express 2$^{\mathrm{n}}$
unique values. In other words, an n-bit LFSR generates a periodic n-bit number sequence
with a period of 2$^{\mathrm{n}}$-1. For example, an 8-bit LFSR has a period of 2$^{8}$-1=255.
The LFSR can be characterized by a polynomial. The minimum polynomial of a primitive
element of the finite extension field GF is called a primitive polynomial. All primitive
polynomials are irreducible because all minimum polynomials are irreducible. For pseudo-random
bit generation, primitive polynomials over GF(2), the field with two elements, can
be employed. LFSR can be built from a primitive polynomial. If the taps of the LFSR
are at the 8$^{\mathrm{th}}$, 6$^{\mathrm{th}}$, 5$^{\mathrm{th}}$, and 4$^{\mathrm{th}}$
bits, the characteristic polynomial of the LFSR is defined by
Since the LFSR output values appear in a pseudo-random sequence, the LFSR can be used
as a random number generator. Hence, it may be a suitable alternative to the AES S-box
that looks to be a random substitution.
There are two methods to implement the LFSR: Fibonacci and Galois. Fig. 5 shows examples of the 8-bit LFSRs implemented using these two methods. In the Fibonacci
method, the taps are XORed with the least significant bit (LSB) and then fed into
the most significant bit (MSB). On the other hand, in the Galois method, the LSB is
shifted to the MSB direction, and then the taps are XORed with the LSB before they
are shifted to the next bit.
Fig. 5. Example implementations of 8-bit LFSRs with (a) Fibonacci; (b) Galois.
3.2 The Proposed AES Architecture
Fig. 6(a) shows the proposed S-box architecture using the Fibonacci LFSR for the SubByte of
the round. Also, this LFSR structure is adopted in the round key scheduler. It includes
16 8-bit LFSRs with XOR gates. Note that the LFSR is based on Eq. (1), which is derived from the list of degree 8 primitive polynomials. The LFSR can also
be implemented using the Galois method. Each 8-bit LFSR contains 8 D-flipflops and
3 XOR gates. A 128-bit input is divided into multiple 8-bit data, fed into each LFSR.
The least significant 8-bit are taken as the input to the least significant LFSR to
generate the corresponding 8-bit partial output. Each LFSR output is XORed with 8-bit
partial input, and its output is fed into the next LFSR to produce the corresponding
partial 8-bit output. These are concatenated to form a 128-bit output, which is utilized
for the SubByte for encryption and round keys. Note that additional LFSRs with XOR
gates are required to produce the 192-bit and 256-bit output for use in the AES-192
and AES-256 key schedulers. Importantly, instead of using 8-bit LFSRs, we can also
use 16 and 32-bit LFSRs that include 16 and 32 D-flipflops with 3 XOR gates, respectively.
Eight 16-bit and four 32-bit LFSRs are required to generate a 128-bit output. For
the AES-192 and AES-256 key schedulers, the 16-bit LFSR-based design needs additional
4 and 8 LFSRs, and 32-bit LFSR-based one requires additional 2 and 4 LFSRs, respectively,
to produce 192-bit and 256-bit outputs. Also, the characteristic polynomials of 16-bit
LFSR P$_{16}$(x) and 32-bit LFSR P$_{32}$(x) are as follow:
Note that $P_{16}(x)$ and $P_{32}(x)$ are taken from the list of primitive polynomials
of degree 16 and 32. The inverse S-box architecture using 8-bit Fibonacci LFSR for
InvSubByte is shown in Fig. 6(b). The inverse of the LFSR used for the AES encryption is utilized for the InvSubByte
for decryption. Similar to the proposed S-Box, the 128-bit input is split into multiple
pieces. Each inverse LFSR takes 8-bit as the input, and its output is XORed with the
input to produce the corresponding same-sized partial output. These are combined to
generate a 128-bit output for InvSubByte used in decryption. Similarly, inverse LFSR
can be implemented by both Fibonacci and Galois methods and 16-bit and 32-bit inverse
LFSRs can be leveraged to implement the inverse S-box. Also, the number of inverse
LFSRs needed for InvSubByte in the AES-128, AES-192, and AES-256 is the same as the
number of LFSRs required for SubByte.
Fig. 6. Proposed S-Box and Inverse S-Box architectures for (a) SubByte/SubWord; (b) InvSubByte with 8-bit Fibonacci LFSRs.
4. Experimental Results
The proposed AES design was performed in Verilog HDL and synthesized using a 32-nm
CMOS technology to obtain the hardware resource consumption. We also implemented the
conventional design using the same design methodology to compare our design with the
conventional AES design. In our design, the LFSRs were implemented using both Fibonacci
and Galois methods to compare each other. In addition, three different designs according
to key sizes, which are AES-128, AES-192, and AES-256, are also implemented. The AES
architecture with the proposed S-box and inverse S-box was functionally verified that
the encrypted texts are correctly decrypted to the original plain texts.
Table 1 summarizes the hardware resource consumption of the proposed AES with various design
configurations and the conventional AES. We compare the conventional AES design with
our AES-128/192/256 ones with 8-, 16-, and 32-bit Fibonacci and Galois LFSRs. The
proposed AES design reduces the area by more than 50% compared to the traditional
design in all the AES design configurations. Also, it greatly improves the delay,
power, and energy. Specifically, our AES-128 design with Fibonacci LFSR reduces the
area, power, and energy by 55.9%, 26.1%, and 20% in the 8-bit configuration, and 56.7%,
28.3%, and 64.3% in the 16-bit configuration, respectively. Furthermore, compared
to the conventional AES, the area, power, and energy reduction are 57.4%, 29.8%, and
66.7% in the 32-bit LFSR, respectively. In addition, when the Galois LFSR is used,
in the 8-bit design, it decreases the area, power, and energy by 55.5%, 25.8%, and
16.6%, while in 16-bit, it reduces them by 56.7%, 28.9%, and 57.5%, respectively.
Similarly, the reduction of area, power, and energy is by 57.4%, 30.1%, and 75% in
the 32-bit Galois LFSR, respectively. In terms of delay, when using an 8-bit LFSR,
both the Fibonacci and Galois LFSRs increase the delay by 8.4% and 12.4%, respectively.
But, when 16-bit and 32-bit LFSR are used, the delay decreases by 50.2% and 52.6%
in the Fibonacci and 40.2% and 64.1% in the Galois, respectively. When comparing the
AES-192 design with the LFSR used in the conventional design, the delay decreases
by 19.5% in both Fibonacci and Galois LFSRs. However, the area, power, and energy,
when adopting the Fibonacci method, reduce by 50.8%, 21.2%, and 36.6% for the 8-bit
LFSR, and 51.5%, 23.5%, and 38.4% for the 16-bit design, respectively. The area, power,
and energy also reduce by 52.1%, 25.3%, and 39.8% using the 32-bit LFSR, respectively.
Consequently, the 8-bit Galois method reduces the area, power, and energy by 50.9%,
22.8%, and 37.8%, and the 16-bit Galois reduces them by 51.6%, 24.2%, and 39%, respectively.
Similarly, in the 32-bit design, the area, power, and energy reduce by 52.2%, 25.7%,
and 40.2%, respectively. In the case of the AES-256 design, both the Fibonacci and
Galois LFSRs reduce the delay by 11.2% in the 8-bit and 16-bit designs, and 35.7%
in the 32-bit one, compared to the traditional design. In addition, when using the
Fibonacci LFSR, the 8-bit design reduces the area, power, and energy by 53.4%, 23.6%,
and 32.2%, the 16-bit one reduces them by 54.1%, 25.6%, and 33.9%, and the 32-bit
design reduces them by 54.8%, 27.2%, and 53.1%, respectively. Besides, when the Galois
LFSR is used, the area, power, and energy decrease by 53.6%, 26.4%, and 34.6% in the
8-bit design, 54.2%, 27.2%, and 35.3% in the 16-bit one, and 54.8%, 27.6%, and 53.4%
in the 32-bit, respectively.
We obtained the area-delay product (ADP) to jointly compare the AES designs in terms
of area and delay. Fig. 7 exhibits the ADP of the conventional AES and the proposed designs. In the case of
the AES-128, the designs adopting 8-, 16-, and 32-bit Fibonacci LFSRs show an ADP
efficiency of 52.2%, 78.4%, and 79.8%, compared to the conventional one, respectively.
Also, the Galois LFSR improves the ADP performance by 50%, 74.1%, and 84.7% in 8,
16, and 32-bit designs, respectively. In addition, the AES-192 improved the ADP by
60.4%, 60.9%, and 61.5% when used with the 8-, 16-, and 32-bit Fibonacci LFSRs, respectively.
When applied to the Galois LFSR, the ADP improves by 60.5%, 61%, and 61.5% in the
8-, 16-, and 32-bit designs, respectively. The 8-, 16-, and 32-bit Fibonacci LFSR-based
AES-256 designs enhance the ADP by 58.6%, 59.2%, and 70.9%, respectively. Additionally,
the Galois LFSR improves the ADP performance by 51.9%, 52.9%, and 66.3% using the
8-, 16-, and 32-bit designs, respectively.
In addition to the ADP, the energy-delay product (EDP) was calculated to examine the
design from an energy efficiency perspective. The EDP comparison of the proposed design
and the traditional AES is shown in Fig. 7. The AES-128 designs with the 8-, 16-, and 32-bit Fibonacci LFSRs decrease the EDP
by 40.9%, 74.4%, and 76.6% compared to the traditional AES, respectively. On the other
hand, the Galois LFSR improves the EDP by 38.1%, 69.8%, and 82.5% using the 8-, 16-,
and 32-bit designs, respectively. In addition, when the 8-, 16-, and 32-bit Fibonacci
and Galois LFSRs are employed in the AES-192, the EDP decreases by 50%, 52.9%, and
55.1% in the Fibonacci and 52%, 53.7%, and 55.5% in the Galois, respectively, compared
to the existing design. Furthermore, the proposed AES-256 design shows better EDP
performance of 48.2%, 50.8%, and 65.9% in the 8-, 16-, and 32-bit Fibonacci LFSRs
and 51.9%, 52.9%, and 66.3% using the 8-, 16-, and 32-bit Galois LFSR, respectively.
The experimental results confirm that the 8-bit Galois LFSR in the AES-128 shows the
smallest performance improvement and the 8-bit Fibonacci LFSR in the AES-192/256 results
in the least performance improvement. Moreover, the 32-bit Galois LFSR among the proposed
AES designs offers the greatest hardware efficiency.
Fig. 7. Comparison of area-delay product (ADP) of various AES designs.
Fig. 8. Comparison of energy-delay product (EDP) of various AES designs.
Table 1. Comparison of AES hardware resource consumption.
|
AES-128
|
AES-192
|
AES-256
|
Area (μm$^{2}$)
|
Delay (ns)
|
Power (mW)
|
Energy (pJ)
|
Area (μm$^{2}$)
|
Delay (ns)
|
Power (mW)
|
Energy (pJ)
|
Area (μm$^{2}$)
|
Delay (ns)
|
Power (mW)
|
Energy (pJ)
|
Conventional Design
|
57407
|
2.51
|
8.3
|
20.83
|
66830
|
1.13
|
10.04
|
11.35
|
76525
|
1.43
|
11.52
|
16.47
|
Fibonacci
LFSR based Design
|
8-bit
|
25317
|
2.72
|
6.13
|
16.67
|
32902
|
0.91
|
7.91
|
7.20
|
35673
|
1.27
|
8.8
|
11.18
|
16-bit
|
24884
|
1.25
|
5.95
|
7.44
|
32428
|
0.91
|
7.68
|
6.99
|
35133
|
1.27
|
8.57
|
10.88
|
32-bit
|
24473
|
1.19
|
5.83
|
6.94
|
31984
|
0.91
|
7.5
|
6.83
|
34627
|
0.92
|
8.39
|
7.72
|
Galois
LFSR based Design
|
8-bit
|
25571
|
2.82
|
6.16
|
17.37
|
32792
|
0.91
|
7.75
|
7.05
|
35543
|
1.27
|
8.48
|
10.77
|
16-bit
|
24835
|
1.5
|
5.9
|
8.85
|
32373
|
0.91
|
7.61
|
6.93
|
35067
|
1.27
|
8.39
|
10.66
|
32-bit
|
24449
|
0.9
|
5.8
|
5.22
|
31957
|
0.91
|
7.46
|
6.79
|
34594
|
0.92
|
8.37
|
7.67
|
5. Conclusion
We proposed a novel low-cost AES architecture exploiting an LFSR-based S-box for IoT
applications. The existing S-Box and inverse S-box of the AES are replaced by the
proposed LSFR-based designs to reduce the hardware resource consumption significantly.
In addition, the round key scheduler is replaced with the LFSR-based design. The various
sized LFSRs using Fibonacci and Galois methods are used in designing the SubByte of
the AES encryption round, InvSubByte of the decryption round, and the entire key scheduler.
To compare the proposed design with the conventional S-box and inverse S-box based
AES one, we implemented various designs in a 32-nm CMOS technology and functionally
verified the AES encryption and decryption processes. The results show that the proposed
designs reduce the area, delay, power, and energy by 57.4%, 64.1%, 30.1%, and 75%
in the AES-128, 52.2%, 12.5%, 21.9%, and 31.7% in the AES-192, and 54.8%, 5.2%, 23.4%,
and 27.4% in the AES-256, respectively, compared to the traditional one. Also, in
terms of ADP and EDP, the AES-128 shows better ADP and EDP efficiencies of at least
50% and 38.1%, with a maximum of 84.7% and 82.5%, respectively. Also, the AES-192
achieves ADP and EDP efficiencies of at least 60.4% and 50%, with a maximum of 61.5%
and 55.5%, respectively. The AES-256 improves the ADP and EDP performances by at least
58.6% and 48.2%, with a maximum of 70.9% and 66.3%, respectively. According to our
experiment, the hardware performance of the AES-128, AES-192, and AES-256 with the
32-bit Galois LFSR demonstrates the best efficiency among the various design configurations.
Moreover, they show significantly improved hardware efficiency compared to the existing
ones. Accordingly, the proposed LFSR-based S-box and inverse S-box can be easily applied
to the AES hardware design to improve the hardware overheads considerably. Moreover,
the proposed AES architecture is suitable for providing security for IoT devices.
ACKNOWLEDGMENTS
This work was supported in part by the Basic Science Research Program through
the National Research Foundation of Korea (NRF) funded by the Ministry of Education
(NRF-2019R1I1A3A01061266) and in part by the BK21 FOUR project (AI-driven Convergence
Software Education Research Program) funded by the Ministry of Education, School of
Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).
REFERENCES
National Institute of Standards and Technology (NIST) , Nov. 2001, Advanced Encryption
Standard (AES), Federal Information Processing Standards (FIPS) publication 197
Lee D., Kim Y., 2021, Design of a Light-Weight Key Scheduler for AES using LFSR for
IoT Applications, IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia),
pp. 1-2
Kumar P., Rana S. B., 2016, Development of Modified AES Algorithm for Data Security,
Optik, Vol. 127, No. 4, pp. 2341-2345
Abdelrahman A. A., Fouad M. M., Dahshan H., Mousa A. M., 2017, High Performance CUDA
AES Implementation: A Quantitative Performance Analysis Approach, Computing Conference,
pp. 1077-1085
Gilmore R., Hanley N., O'Neill M., 2015, Neural Network based Attack on a Masked Implementation
of AES, IEEE International Symposium on Hardware Oriented Security and Trust (HOST),
pp. 106-111
Hajihassani O., Monfared S. K., Khasteh S. H., Gorgin S., 2019, Fast AES Implementation:
A High-Throughput Bitsliced Approach, IEEE Transactions on Parallel and Distributed
Systems, Vol. 30, No. 10, pp. 2211-2222
Gaj K., Chodowiec P., 2009, FPGA and ASIC Implementations of AES, Cryptographic Engineering,
pp. 235-294
Mestiri H., et al. , 2016, A High-Speed AES Design Resistant to Fault Injection Attacks,
Microprocessors and Microsystems, Vol. 41, pp. 47-55
Yu W., Köse S., 2017, A Lightweight Masked AES Implementation for Securing IoT Against
CPA Attacks, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 64,
No. 11, pp. 2934-2944
Soliman S. M., Magdy B., Abd El Ghany M. A., 2016, Efficient Implementation of the
AES Algorithm for Security Applications, IEEE International System-on-Chip Conference
(SOCC), pp. 206-210
Ueno R., Morioka S., Homma N., Aoki T., 2016, A High Throughput/Gate AES Hardware
Architecture by Compressing Encryption and Decryption Datapaths, International Conference
on Cryptographic Hardware and Embedded Systems, pp. 538-558
Zodpe H., Sapkal A., 2020, An Efficient AES Implementation using FPGA with Enhanced
Security Features, Journal of King Saud University - Engineering Sciences, Vol. 32,
No. 2, pp. 115-122
Soltani A., Sharifian S., 2015, An Ultra-High Throughput and Fully Pipelined Implementation
of AES Algorithm on FPGA, Microprocessors and Microsystems, Vol. 39, No. 7, pp. 480-493
Srinivas N. S. S., Akramuddin M., 2016, FPGA based Hardware Implementation of AES
Rijndael Algorithm for Encryption and Decryption, International Conference on Electrical
Electronics and Optimization Techniques (ICEEOT), pp. 1769-1776
Liu Q., Xu Z., Yuan Y., 2015, High Throughput and Secure Advanced Encryption Standard
on Field Programmable Gate Array with Fine Pipelining and Enhanced Key Expansion,
IET Computers & Digital Techniques, Vol. 9, No. 3, pp. 175-184
Pammu A. A., Chong K. -S., Lwin Ne K. Z., Gwee B. -H., 2016, High Secured Low Power
Multiplexer-LUT Based AES S-Box Implementation, International Conference on Information
Systems Engineering (ICISE), pp. 3-7
Zhang X., Parhi K. K., 2004, High-Speed VLSI Architectures for the AES Algorithm,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 9,
pp. 957-967
Author
Donghui Lee received his B.S. degree in the Department of Electric and Electronic
Engineering from Halla University, Wonju, the Republic of Korea in 2020. He is pursuing
an M.S. degree in the School of Computer Science and Engineering at Kyungpook National
University, Daegu, Republic of Korea. His research interests include artificial intelligence
(AI) accelerator and hardware design
Myeongjin Kwak received the B.S. degree from the School of Computer Science and
Engineering at Kyungpook National University, Daegu, the Republic of Korea in 2021,
where he is currently pursuing an M.S. degree. His research interests include machine
learning and neuromorphic computing.
Jungwon Lee received the B.S. degree from the School of Computer Science and Engineering
from Kyungpook National University, Daegu, the Republic of Korea in 2021, where she
is currently pursuing an M.S. degree. Her research interests include deep learning
and approximate arithmetic.
Beomjun Kim received his B.S. degree from the School of Computer Science and Engineering
at the Kyungpook National University, Daegu, South Korea. Now he is pursuing an M.E.
degree in the School of Computer Science and Engineering at the Kyungpook National
University. His research interests include non-volatile memory, heterogeneous memory
system, data compression, and computer architecture.
Yongtae Kim received the B.S. and M.S. degrees in electrical engineering from the
Korea University, Seoul, Republic of Korea, in 2007 and 2009, respectively, and the
Ph.D. degree from the Department of Electrical and Computer Engineering from the Texas
A&M University, College Station, TX, in 2013. From 2013 to 2018, he was a software
engineer with Intel Corporation, Santa Clara, CA. Since 2018, he has been with the
School of Computer Science and Engineering at Kyungpook National University, Daegu,
the Republic of Korea, where he is currently an assistant professor. His research
interests are energy-efficient integrated circuits and systems, particularly neuromorphic
computing and approximate computing, and new memory devices and architectures.