Mobile QR Code

1. (School of Computer Science and Engineering, Kyungpook National University, Daegu, Korea)

Advanced encryption standard (AES), Linear feedback shift register (LFSR), Substitution-box (S-Box), Key scheduler, Cryptographic, Symmetric block cipher

## 1. Introduction

As the demand for mobile electronic and Internet of Thing (IoT) devices has rapidly grown, energy efficiency and cost become the key constraints in designing these devices under the limited battery and power resources. Notably, the security of data transmission among these devices has become more important as their transmission channel is vulnerable to security attacks. Hence, cryptographic processing is required for these devices to protect data for secure communication. Particularly, the low-cost and energy-efficient cryptographic hardware accelerator is indispensable to efficiently protect and deliver the transmitted data in the IoT devices. In these accelerators, one of the following two types of cryptographic algorithms is used for data encryption: symmetric and asymmetric ciphers. The symmetric cipher, such as Data Encryption Standard (DES) and Advanced Encryption Standard (AES), uses an identical secret key for encryption and decryption. In contrast, the asymmetric one uses two different secret keys for encryption and decryption. The Diffie-Hellman, elliptic curve cryptography (ECC), El Gamal, and RSA are examples of the asymmetric cipher. Between the two, the symmetric cipher is more suitable for processing large amounts of data as it is faster and more cost-effective.

Among symmetric ciphers, the AES, a block cipher developed by the National Institute of Standards and Technology (NIST) in 2001 to replace the DES, is one of the most widely used encryption algorithms in various security systems due to its efficiency and simplicity [1]. While the earlier block ciphers, such as DES, blowfish, and SEED, employ the Feistel network structure in which encryption consists of the repetition of a specific calculation function (i.e., round function) in each round, the AES adopts its own network structure. The overall structure of the AES encryption process is shown in Fig. 1. The AES takes a 128-bit plain text and produces the same-sized cipher text using a secret key. The AES includes three modes: AES-128, AES-192, and AES-256, which correspond to a key length of 128-, 192-, 256-bit, respectively. The number of rounds (N) in the AES is determined by the key length: 10 rounds for a 128-bit key, 12 rounds for a 192-bit key, and 14 rounds for a 256-bit key. Before the first round, a single transformation of the AddRoundkey is performed, followed by N – 1 rounds of four transformations of SubByte, ShiftRows, MixColumns, and AddRoundkey. The final round excludes MixColumns. The decryption process is the inverse of encryption. The decryption includes the inverse of the encryption transformation except for AddRoundKey, and thus each round has InvShiftRows, InvMixColumn, InvSubByte, and AddRoundKey. Also, the AES requires a key scheduler to generate a round key for each round. The input key is expanded into multiple 128-bit sub-keys through four word-based functions of the key scheduler: RotWord, SubWord, Rcon, and AddWord. Each word consists of four bytes, and the key scheduler processes 44 words for the 128-bit key, 52 words for the 192-bit key, and 60 words for the 256-bit key

The AES’s key scheduler, encryption, and decryption include the substitution-box (S-box) or inverse S-box in SubWord, SubByte, and InvSubByte, respectively. Each byte of the corresponding input is mapped into a new byte value by a non-linear function. Since the S-box and inverse S-box define a 16${\times}$16 matrix of byte values and can be implemented using a look-up table (LUT) or a complicated combinational logic, they occupy a considerable portion of AES hardware. According to our preliminary analysis on AES hardware, which is shown in Fig. 2, the S-box (i.e., SubByte) and inverse S-box (i.e., InvSubByte) occupy a significant portion of the total area and power. Specifically, they consume 75% and 69% of the area, and 50% and 32% of the power in the AES-128 encryption and decryption hardware, respectively. The InvMixColumn takes more power than the InvSubBytes as it requires more complex polynomial multiplication than the MixColumn of encryption. The Galois Field GF(2$^{8}$) inverse operation needs to be included if not expressed as LUT, and the field elements of GF(2$^{8}$) are mapped to an isomorphic composite field. Implementing the composite field of the AES algorithm may not be efficient in hardware. Therefore, it is meaningful to implement an AES without using LUTs for S-box and inverse S-box and design new light-weight S-box and inverse S-box, and adopt them in the AES hardware design to improve the overall hardware efficiency.

To this end, in this paper, we propose a novel light-weight AES architecture that replaces the S-box, inverse S-box, and key scheduler with a linear feedback shift register (LFSR)-based counterparts for IoT applications. Our preliminary work of the LFSR-based AES key scheduler for encryption and decryption was presented in [2]. In this work, we extend our LFSR scheme to other AES encryption and decryption modules to significantly reduce the hardware resource consumption of the entire AES design. The proposed AES architecture consists of multiple LFSRs with XOR gates to produce substitution values and round keys. Our designs, implemented in a 32-nm CMOS technology, achieve area and energy reductions of up to 57.4% and 75% in the AES-128, 52.2% and 31.7% in the AES-192, and 54.8% and 27.4% in the AES-256, respectively, compared to the conventional S-box based counterpart.

## 2. Related Works

The AES can be implemented in both software and hardware. Previous research on the AES software had focused on high throughput and improving the quality of encryption and decryption [3-6]. Also, the earlier AES hardware aimed for low-latency and high data performance using pipelined or parallel architectures [7-17].

Kumar et al. [3] increased the number of encryption and decryption rounds of the AES to 16 to improve data security in software AES implementations. Increasing the number of rounds in the AES takes more computation time but increases the system’s security by making it more difficult for attackers to break. In [4], the AES electronic codebook (ECB) encryption with three different graphic processing unit (GPU) architectures (i.e., Kepler, Maxwell, and Pascal) was implemented. Eight parallel AES architectures, focusing on different parallel granularities and thread block sizes, were introduced. Searching in terms of workload distribution over threads and thread blocks provided high performance, and it was especially effective in improving the workload per thread using the encryption AES block. Gilmore et al. [5] used neural networks to identify mask values and the secret key with a single attack trace. Hajihassani et al. [6] proposed a novel AES implementation using a bit-sliced approach. In this work, the existing row-first form of the input data was replaced with a column-first representation. Through a parallelization unit, each GPU thread simultaneously processed a number of thirty-row 128-bit data improving the performance and performance per cost.

Generally, hardware can be implemented in Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA), and so does the AES hardware [7]. The ASIC is an integrated circuit (IC) designed for a specific application. Compared to the FPGA, it requires less power, exhibits faster speed, and is suitable for mass production. However, it lacks the flexibility needed. On the other hand, the FPGA has flexibility, reconfigurability, and small-volume properties. Earlier studies presented AES implementations using a pipelined structure, as shown in Fig. 3, to focus on low-latency and high data throughput. Mestiri et al. [8], to prevent fault injection attacks, divided the AES into two parts and inserted a pipeline between them. The S-box and inverse S-box were independently designed so that they could be used in logic gates based on Galois Fields and LUTs. Their design improved the area overhead, frequency, and throughput compared to the existing works. In [9], a false key-based AES design to withstand a correlation power analysis (CPA) attack was presented. The wave dynamic differential logic (WDDL)-based XOR gates hid the data correlated with the false key. Compared to the unprotected AES scheme, the false key and WDDL assisted AES architecture reduced the performance overhead, power, and area. Two AES encryption algorithms, adopting iterative looping and pipelining, were presented in [10]. The partial loop release approach, which used iterations and multistage pipelining, optimized area, throughput, and dynamic power consumption using only AES-128. In [11], a hardware approach for the AES encryption and decryption suitable for cipher block-chaining (CBC) mode was proposed. The inverting circuit for the SubByte and InvSubByte was integrated without any delay overhead. The critical path delay considerably decreased compared to the previous one, and the proposed approach was effective in terms of throughput per area due to the new operation-reordering and register-retiming technology. A new S-box and key generation to enhance the security feature of the AES algorithm was presented in [12]. The pseudo-noise (PN) sequence generator generated the S-box value and the initial key required for encryption and decryption. When using the Strict Avalanche Criterion for 2048, the avalanche effect showed better performance than the traditional AES design. Compared with the non-pipelined and pipelined models, this approach achieved higher throughput and smaller area. Three high-throughput AES implementations in EBC mode and one ultra-high-throughput AES implementation in CTR mode were proposed in [13]. The designs used GF(2$^{8}$) to create an area-delay efficient multiplier that was applied to the AES. To achieve high throughput, loop-unrolling, fully pipelining, and sub pipelining techniques were used by inserting registers in appropriate positions. Srinivas et al. [14] proposed the AES encryption and decryption adopting pre-computed LUTs. Instead of using the conventional GF(2$^{8}$) calculation formula, the MixColumns and InvMixColumns functions in the AES rounds were implemented using LUTs. The proposed design showed good performance in latency, throughput, area, and power. Liu et al. [15] developed an FPGA-based efficient pipelining AES structure for high-speed protection. The new key expansion approach increased the key complexity by up to 2N-1 for an N-round AES, which improved the throughput and security of the AES algorithm. In [16], an S-box based on the multiplexer LUT (MLUT), instead of the existing S-box using the inverse transform of multiplication of GF(2$^{8}$) and the affine transform, was proposed. The MLUT was based on a 256-byte to 1-byte multiplexer with 256-byte memory, so the circuit was simple, and power consumption was low. Since the variance of power dissipation for different processing data was small, the proposed S-box was also secure from the CPA-based side-channel attack (SCA). Zhang et al. [17] implemented SubByte and InvSubByte transformations using only combinational logic instead of S-box based on LUT. In this work, the unbreakable delay caused by the LUT was eliminated, and the benefits of sub-pipelining were exploited. In addition, they use composite field arithmetic to reduce hardware complexity and present a key expansion architecture suitable for sub-pipelined round units. The proposed design was faster and achieved higher throughput per slide than the conventional one.

## 3. The Proposed AES Architecture

In the AES hardware design, our key contribution is in replacing the conventional S-box and inverse S-box with new low-cost counterparts to reduce hardware resource consumption greatly. Particularly, we use the LFSR to implement SubByte and InvSubByte for the encryption and decryption of the AES round and the entire key scheduler of the round shown in Fig. 4 (see gray boxes). First, Section 3.1 explains the basics of the LFSR and briefly introduces the two LFSR configurations implemented in the proposed architecture. Then, the proposed AES design that substitutes the S-box, inverse S-box, and key scheduler with the LFSRs is presented in Section 3.2.

### 3.1 Linear Feedback Shift Register

The LFSR is a shift register whose next input bits are calculated as a linear function of its current state value. The initial value of the LFSR is called a seed, and the bit that affects the next state of the LFSR is called a tap. The value of an n-bit LFSR has a specific period (i.e., length of the loop) of 2$^{\mathrm{n}}$-1 as the number of register values represented by the LFSR is finite and can express 2$^{\mathrm{n}}$ unique values. In other words, an n-bit LFSR generates a periodic n-bit number sequence with a period of 2$^{\mathrm{n}}$-1. For example, an 8-bit LFSR has a period of 2$^{8}$-1=255. The LFSR can be characterized by a polynomial. The minimum polynomial of a primitive element of the finite extension field GF is called a primitive polynomial. All primitive polynomials are irreducible because all minimum polynomials are irreducible. For pseudo-random bit generation, primitive polynomials over GF(2), the field with two elements, can be employed. LFSR can be built from a primitive polynomial. If the taps of the LFSR are at the 8$^{\mathrm{th}}$, 6$^{\mathrm{th}}$, 5$^{\mathrm{th}}$, and 4$^{\mathrm{th}}$ bits, the characteristic polynomial of the LFSR is defined by

##### (1)
$P_{8}\left(x\right)=x^{8}+x^{6}+x^{5}+x^{4}+1$

Since the LFSR output values appear in a pseudo-random sequence, the LFSR can be used as a random number generator. Hence, it may be a suitable alternative to the AES S-box that looks to be a random substitution.

There are two methods to implement the LFSR: Fibonacci and Galois. Fig. 5 shows examples of the 8-bit LFSRs implemented using these two methods. In the Fibonacci method, the taps are XORed with the least significant bit (LSB) and then fed into the most significant bit (MSB). On the other hand, in the Galois method, the LSB is shifted to the MSB direction, and then the taps are XORed with the LSB before they are shifted to the next bit.

### 3.2 The Proposed AES Architecture

Fig. 6(a) shows the proposed S-box architecture using the Fibonacci LFSR for the SubByte of the round. Also, this LFSR structure is adopted in the round key scheduler. It includes 16 8-bit LFSRs with XOR gates. Note that the LFSR is based on Eq. (1), which is derived from the list of degree 8 primitive polynomials. The LFSR can also be implemented using the Galois method. Each 8-bit LFSR contains 8 D-flipflops and 3 XOR gates. A 128-bit input is divided into multiple 8-bit data, fed into each LFSR. The least significant 8-bit are taken as the input to the least significant LFSR to generate the corresponding 8-bit partial output. Each LFSR output is XORed with 8-bit partial input, and its output is fed into the next LFSR to produce the corresponding partial 8-bit output. These are concatenated to form a 128-bit output, which is utilized for the SubByte for encryption and round keys. Note that additional LFSRs with XOR gates are required to produce the 192-bit and 256-bit output for use in the AES-192 and AES-256 key schedulers. Importantly, instead of using 8-bit LFSRs, we can also use 16 and 32-bit LFSRs that include 16 and 32 D-flipflops with 3 XOR gates, respectively. Eight 16-bit and four 32-bit LFSRs are required to generate a 128-bit output. For the AES-192 and AES-256 key schedulers, the 16-bit LFSR-based design needs additional 4 and 8 LFSRs, and 32-bit LFSR-based one requires additional 2 and 4 LFSRs, respectively, to produce 192-bit and 256-bit outputs. Also, the characteristic polynomials of 16-bit LFSR P$_{16}$(x) and 32-bit LFSR P$_{32}$(x) are as follow:

##### (2)
$P_{16}(x)=x^{16}+x^{14}+x^{13}+x^{11}+1 \\$
##### (3)
$P_{32}(x)=x^{32}+x^{22}+x^{2}+x^{1}+1$

Note that $P_{16}(x)$ and $P_{32}(x)$ are taken from the list of primitive polynomials of degree 16 and 32. The inverse S-box architecture using 8-bit Fibonacci LFSR for InvSubByte is shown in Fig. 6(b). The inverse of the LFSR used for the AES encryption is utilized for the InvSubByte for decryption. Similar to the proposed S-Box, the 128-bit input is split into multiple pieces. Each inverse LFSR takes 8-bit as the input, and its output is XORed with the input to produce the corresponding same-sized partial output. These are combined to generate a 128-bit output for InvSubByte used in decryption. Similarly, inverse LFSR can be implemented by both Fibonacci and Galois methods and 16-bit and 32-bit inverse LFSRs can be leveraged to implement the inverse S-box. Also, the number of inverse LFSRs needed for InvSubByte in the AES-128, AES-192, and AES-256 is the same as the number of LFSRs required for SubByte.

## 4. Experimental Results

The proposed AES design was performed in Verilog HDL and synthesized using a 32-nm CMOS technology to obtain the hardware resource consumption. We also implemented the conventional design using the same design methodology to compare our design with the conventional AES design. In our design, the LFSRs were implemented using both Fibonacci and Galois methods to compare each other. In addition, three different designs according to key sizes, which are AES-128, AES-192, and AES-256, are also implemented. The AES architecture with the proposed S-box and inverse S-box was functionally verified that the encrypted texts are correctly decrypted to the original plain texts.

Table 1 summarizes the hardware resource consumption of the proposed AES with various design configurations and the conventional AES. We compare the conventional AES design with our AES-128/192/256 ones with 8-, 16-, and 32-bit Fibonacci and Galois LFSRs. The proposed AES design reduces the area by more than 50% compared to the traditional design in all the AES design configurations. Also, it greatly improves the delay, power, and energy. Specifically, our AES-128 design with Fibonacci LFSR reduces the area, power, and energy by 55.9%, 26.1%, and 20% in the 8-bit configuration, and 56.7%, 28.3%, and 64.3% in the 16-bit configuration, respectively. Furthermore, compared to the conventional AES, the area, power, and energy reduction are 57.4%, 29.8%, and 66.7% in the 32-bit LFSR, respectively. In addition, when the Galois LFSR is used, in the 8-bit design, it decreases the area, power, and energy by 55.5%, 25.8%, and 16.6%, while in 16-bit, it reduces them by 56.7%, 28.9%, and 57.5%, respectively. Similarly, the reduction of area, power, and energy is by 57.4%, 30.1%, and 75% in the 32-bit Galois LFSR, respectively. In terms of delay, when using an 8-bit LFSR, both the Fibonacci and Galois LFSRs increase the delay by 8.4% and 12.4%, respectively. But, when 16-bit and 32-bit LFSR are used, the delay decreases by 50.2% and 52.6% in the Fibonacci and 40.2% and 64.1% in the Galois, respectively. When comparing the AES-192 design with the LFSR used in the conventional design, the delay decreases by 19.5% in both Fibonacci and Galois LFSRs. However, the area, power, and energy, when adopting the Fibonacci method, reduce by 50.8%, 21.2%, and 36.6% for the 8-bit LFSR, and 51.5%, 23.5%, and 38.4% for the 16-bit design, respectively. The area, power, and energy also reduce by 52.1%, 25.3%, and 39.8% using the 32-bit LFSR, respectively. Consequently, the 8-bit Galois method reduces the area, power, and energy by 50.9%, 22.8%, and 37.8%, and the 16-bit Galois reduces them by 51.6%, 24.2%, and 39%, respectively. Similarly, in the 32-bit design, the area, power, and energy reduce by 52.2%, 25.7%, and 40.2%, respectively. In the case of the AES-256 design, both the Fibonacci and Galois LFSRs reduce the delay by 11.2% in the 8-bit and 16-bit designs, and 35.7% in the 32-bit one, compared to the traditional design. In addition, when using the Fibonacci LFSR, the 8-bit design reduces the area, power, and energy by 53.4%, 23.6%, and 32.2%, the 16-bit one reduces them by 54.1%, 25.6%, and 33.9%, and the 32-bit design reduces them by 54.8%, 27.2%, and 53.1%, respectively. Besides, when the Galois LFSR is used, the area, power, and energy decrease by 53.6%, 26.4%, and 34.6% in the 8-bit design, 54.2%, 27.2%, and 35.3% in the 16-bit one, and 54.8%, 27.6%, and 53.4% in the 32-bit, respectively.

In addition to the ADP, the energy-delay product (EDP) was calculated to examine the design from an energy efficiency perspective. The EDP comparison of the proposed design and the traditional AES is shown in Fig. 7. The AES-128 designs with the 8-, 16-, and 32-bit Fibonacci LFSRs decrease the EDP by 40.9%, 74.4%, and 76.6% compared to the traditional AES, respectively. On the other hand, the Galois LFSR improves the EDP by 38.1%, 69.8%, and 82.5% using the 8-, 16-, and 32-bit designs, respectively. In addition, when the 8-, 16-, and 32-bit Fibonacci and Galois LFSRs are employed in the AES-192, the EDP decreases by 50%, 52.9%, and 55.1% in the Fibonacci and 52%, 53.7%, and 55.5% in the Galois, respectively, compared to the existing design. Furthermore, the proposed AES-256 design shows better EDP performance of 48.2%, 50.8%, and 65.9% in the 8-, 16-, and 32-bit Fibonacci LFSRs and 51.9%, 52.9%, and 66.3% using the 8-, 16-, and 32-bit Galois LFSR, respectively.

The experimental results confirm that the 8-bit Galois LFSR in the AES-128 shows the smallest performance improvement and the 8-bit Fibonacci LFSR in the AES-192/256 results in the least performance improvement. Moreover, the 32-bit Galois LFSR among the proposed AES designs offers the greatest hardware efficiency.

##### Table 1. Comparison of AES hardware resource consumption.
 AES-128 AES-192 AES-256 Area (μm$^{2}$) Delay (ns) Power (mW) Energy (pJ) Area (μm$^{2}$) Delay (ns) Power (mW) Energy (pJ) Area (μm$^{2}$) Delay (ns) Power (mW) Energy (pJ) Conventional Design 57407 2.51 8.3 20.83 66830 1.13 10.04 11.35 76525 1.43 11.52 16.47 Fibonacci LFSR based Design 8-bit 25317 2.72 6.13 16.67 32902 0.91 7.91 7.20 35673 1.27 8.8 11.18 16-bit 24884 1.25 5.95 7.44 32428 0.91 7.68 6.99 35133 1.27 8.57 10.88 32-bit 24473 1.19 5.83 6.94 31984 0.91 7.5 6.83 34627 0.92 8.39 7.72 Galois LFSR based Design 8-bit 25571 2.82 6.16 17.37 32792 0.91 7.75 7.05 35543 1.27 8.48 10.77 16-bit 24835 1.5 5.9 8.85 32373 0.91 7.61 6.93 35067 1.27 8.39 10.66 32-bit 24449 0.9 5.8 5.22 31957 0.91 7.46 6.79 34594 0.92 8.37 7.67

## 5. Conclusion

We proposed a novel low-cost AES architecture exploiting an LFSR-based S-box for IoT applications. The existing S-Box and inverse S-box of the AES are replaced by the proposed LSFR-based designs to reduce the hardware resource consumption significantly. In addition, the round key scheduler is replaced with the LFSR-based design. The various sized LFSRs using Fibonacci and Galois methods are used in designing the SubByte of the AES encryption round, InvSubByte of the decryption round, and the entire key scheduler. To compare the proposed design with the conventional S-box and inverse S-box based AES one, we implemented various designs in a 32-nm CMOS technology and functionally verified the AES encryption and decryption processes. The results show that the proposed designs reduce the area, delay, power, and energy by 57.4%, 64.1%, 30.1%, and 75% in the AES-128, 52.2%, 12.5%, 21.9%, and 31.7% in the AES-192, and 54.8%, 5.2%, 23.4%, and 27.4% in the AES-256, respectively, compared to the traditional one. Also, in terms of ADP and EDP, the AES-128 shows better ADP and EDP efficiencies of at least 50% and 38.1%, with a maximum of 84.7% and 82.5%, respectively. Also, the AES-192 achieves ADP and EDP efficiencies of at least 60.4% and 50%, with a maximum of 61.5% and 55.5%, respectively. The AES-256 improves the ADP and EDP performances by at least 58.6% and 48.2%, with a maximum of 70.9% and 66.3%, respectively. According to our experiment, the hardware performance of the AES-128, AES-192, and AES-256 with the 32-bit Galois LFSR demonstrates the best efficiency among the various design configurations. Moreover, they show significantly improved hardware efficiency compared to the existing ones. Accordingly, the proposed LFSR-based S-box and inverse S-box can be easily applied to the AES hardware design to improve the hardware overheads considerably. Moreover, the proposed AES architecture is suitable for providing security for IoT devices.

### ACKNOWLEDGMENTS

This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2019R1I1A3A01061266) and in part by the BK21 FOUR project (AI-driven Convergence Software Education Research Program) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).

### REFERENCES

1
National Institute of Standards and Technology (NIST) , Nov. 2001, Advanced Encryption Standard (AES), Federal Information Processing Standards (FIPS) publication 197
2
Lee D., Kim Y., 2021, Design of a Light-Weight Key Scheduler for AES using LFSR for IoT Applications, IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1-2
3
Kumar P., Rana S. B., 2016, Development of Modified AES Algorithm for Data Security, Optik, Vol. 127, No. 4, pp. 2341-2345
4
Abdelrahman A. A., Fouad M. M., Dahshan H., Mousa A. M., 2017, High Performance CUDA AES Implementation: A Quantitative Performance Analysis Approach, Computing Conference, pp. 1077-1085
5
Gilmore R., Hanley N., O'Neill M., 2015, Neural Network based Attack on a Masked Implementation of AES, IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106-111
6
Hajihassani O., Monfared S. K., Khasteh S. H., Gorgin S., 2019, Fast AES Implementation: A High-Throughput Bitsliced Approach, IEEE Transactions on Parallel and Distributed Systems, Vol. 30, No. 10, pp. 2211-2222
7
Gaj K., Chodowiec P., 2009, FPGA and ASIC Implementations of AES, Cryptographic Engineering, pp. 235-294
8
Mestiri H., et al. , 2016, A High-Speed AES Design Resistant to Fault Injection Attacks, Microprocessors and Microsystems, Vol. 41, pp. 47-55
9
Yu W., Köse S., 2017, A Lightweight Masked AES Implementation for Securing IoT Against CPA Attacks, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 64, No. 11, pp. 2934-2944
10
Soliman S. M., Magdy B., Abd El Ghany M. A., 2016, Efficient Implementation of the AES Algorithm for Security Applications, IEEE International System-on-Chip Conference (SOCC), pp. 206-210
11
Ueno R., Morioka S., Homma N., Aoki T., 2016, A High Throughput/Gate AES Hardware Architecture by Compressing Encryption and Decryption Datapaths, International Conference on Cryptographic Hardware and Embedded Systems, pp. 538-558
12
Zodpe H., Sapkal A., 2020, An Efficient AES Implementation using FPGA with Enhanced Security Features, Journal of King Saud University - Engineering Sciences, Vol. 32, No. 2, pp. 115-122
13
Soltani A., Sharifian S., 2015, An Ultra-High Throughput and Fully Pipelined Implementation of AES Algorithm on FPGA, Microprocessors and Microsystems, Vol. 39, No. 7, pp. 480-493
14
Srinivas N. S. S., Akramuddin M., 2016, FPGA based Hardware Implementation of AES Rijndael Algorithm for Encryption and Decryption, International Conference on Electrical Electronics and Optimization Techniques (ICEEOT), pp. 1769-1776
15
Liu Q., Xu Z., Yuan Y., 2015, High Throughput and Secure Advanced Encryption Standard on Field Programmable Gate Array with Fine Pipelining and Enhanced Key Expansion, IET Computers & Digital Techniques, Vol. 9, No. 3, pp. 175-184
16
Pammu A. A., Chong K. -S., Lwin Ne K. Z., Gwee B. -H., 2016, High Secured Low Power Multiplexer-LUT Based AES S-Box Implementation, International Conference on Information Systems Engineering (ICISE), pp. 3-7
17
Zhang X., Parhi K. K., 2004, High-Speed VLSI Architectures for the AES Algorithm, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 9, pp. 957-967

## Author

##### Donghui Lee

Donghui Lee received his B.S. degree in the Department of Electric and Electronic Engineering from Halla University, Wonju, the Republic of Korea in 2020. He is pursuing an M.S. degree in the School of Computer Science and Engineering at Kyungpook National University, Daegu, Republic of Korea. His research interests include artificial intelligence (AI) accelerator and hardware design

##### Myeongjin Kwak

Myeongjin Kwak received the B.S. degree from the School of Computer Science and Engineering at Kyungpook National University, Daegu, the Republic of Korea in 2021, where he is currently pursuing an M.S. degree. His research interests include machine learning and neuromorphic computing.

##### Jungwon Lee

Jungwon Lee received the B.S. degree from the School of Computer Science and Engineering from Kyungpook National University, Daegu, the Republic of Korea in 2021, where she is currently pursuing an M.S. degree. Her research interests include deep learning and approximate arithmetic.

##### Beomjun Kim

Beomjun Kim received his B.S. degree from the School of Computer Science and Engineering at the Kyungpook National University, Daegu, South Korea. Now he is pursuing an M.E. degree in the School of Computer Science and Engineering at the Kyungpook National University. His research interests include non-volatile memory, heterogeneous memory system, data compression, and computer architecture.

##### Yongtae Kim

Yongtae Kim received the B.S. and M.S. degrees in electrical engineering from the Korea University, Seoul, Republic of Korea, in 2007 and 2009, respectively, and the Ph.D. degree from the Department of Electrical and Computer Engineering from the Texas A&M University, College Station, TX, in 2013. From 2013 to 2018, he was a software engineer with Intel Corporation, Santa Clara, CA. Since 2018, he has been with the School of Computer Science and Engineering at Kyungpook National University, Daegu, the Republic of Korea, where he is currently an assistant professor. His research interests are energy-efficient integrated circuits and systems, particularly neuromorphic computing and approximate computing, and new memory devices and architectures.