Mobile QR Code

1. (School of Microelectronics, Hefei University of Technology / Hefei, China ahzhangyq@hfut.edu.cn, 2191158315@qq.com, 1617090911@qq.com, gjxie8005@hfut.edu.cn )

Approximate computing, Multiplier, Compressor, Energy consumption, Image multiplication

1. Introduction

Approximate computing is an attractive paradigm in circuit design, lowering the demand for accurate operations, and reducing power, speed, and area at the expense of a reduction in computing accuracy. The trade-off between hardware cost and computing accuracy is especially relevant to error-resilient applications, such as machine learning and multimedia processing.

Multipliers are the basic blocks of digital systems, and usually consist of three steps: 1) generating partial products, 2) reducing the partial products, and 3) summing the final results. Among them, the second step accounts for the dominant hardware cost. Using efficient compressors can significantly reduce the complexity of this step, and thus, improves the performance of multipliers [1], and 4-2 compressors are widely applied to multipliers to accelerate the reduction of partial products. In [2], a compressor ignored input signal cin and output signal cout to improve the performance of multipliers in terms of power and delay. The multiplier that utilizes the proposed compressor shows a great reduction in hardware requirements and transistor count, compared to the existing designs. Three 4-2 compressors were proposed in [3] by modifying the truth table of an exact compressor. However, the multipliers using these compressors were inferior in overall performance. In [4], the partial-product-altering method was applied to a 4-2 compressor, and realized a balance between hardware cost and multiplier accuracy. A compressor using a majority gate was designed in [5] by ignoring input signal x$_{2}$, cin, and the cout signal to achieve excellent power and delay performance. The stacking circuit technique was adopted in [6] to design approximate multipliers with high computing accuracy while leading to high hardware costs. In [7], a new compressor was designed using only simple AND-OR gates, and the multiplier utilizing this compressor provided a good error-electrical performance trade-off. The dual-quality 4-2 compressors introduced in [8] can be flexibly switched between precise and approximate operating modes. Therefore, multipliers using these compressors can realize dynamic change in accuracy at runtime.

To improve the trade-off between hardware cost and computing accuracy in approximate circuits, this paper proposes a set of approximate 8${\times}$8 Dadda multipliers. To that end, an imprecise 4-2 compressor using only OR and XNOR gates is designed by introducing symmetrical errors into the truth table of the exact compressor. The errors can counteract each other in a multiplier. This method will optimize the design complexity of multipliers in area, power, and delay while generating satisfying results. The main contributions of this paper are summarized as follows.

1) An approximate 4-2 compressor is proposed to simplify the design complexity of the partial production reduction step in multipliers.

2) A set of approximate Dadda multipliers is built from the compressors to find a better structure with a lower hardware cost and higher computing accuracy.

3) The image multiplication operation is realized through these multipliers to evaluate computing accuracy in real applications.

4) The trade-off between hardware cost and accuracy in the multipliers is comprehensively analyzed through various evaluation criteria as an example in approximate computing.

This paper proceeds as follows. In Section 2, the previous approximate 4-2 compressors are reviewed. Section 3 presents the proposed approximate compressor and multipliers. The synthesis results and their application to image processing are presented in Section 4. Section 5 concludes this paper.

2. Related Work

In this paper, we look to 4-2 compressors to build 8${\times}$8 Dadda multipliers owing to their simplified structure and high efficiency in transistor-level implementations. In recent years, several methods have been proposed to design imprecise 4-2 compressors, and they were utilized to design approximate multipliers. Some previous approximate designs that ignored cin and cout are summarized and compared in this section.

In the approximate 4-2 compressor presented in [2], the delay of the critical path is less than the previous design, and the number of gates was further reduced. Three approximate 4-2 compressors were proposed in [3]; they use a k-map to obtain simplified logical expressions that reduce errors while providing a significant performance improvement over previous 4-2 compressors. The first and the second designs in [3] only have four gates, which greatly simplifies the structural complexity. The third design is the most accurate while having a more complex structure compared with other designs. In [4], to simplify the circuit of the 4-2 compressor, an OR gate replaces an XOR gate to compute a sum, thus introducing additional errors. An ultra-efficient compressor proposed in [5] consists of one majority gate, which is different from conventional designs. Since input x$_{2}$ is omitted, and output sum is always equal to 1, this approximate compressor reaches a simpler logic implementation. The compressors in [6] have high accuracy, using the stacking circuit technique. A hardware-efficient approximate compressor proposed in [9] was obtained by modifying the truth table of the exact compressor, and consists of only three NOR gates and one NAND gate. In [10], an ultra-compact 4-2 compressor was proposed based on simple AND-OR logic, which leads to a trade-off between hardware cost and precision. In [11], the proposed compressor was obtained by modifying an approximate compressor, and the performance of the applied multiplier improved. Three approximate compressors were presented in [12], and they all innovatively reduced the number of outputs to one, thus significantly reducing the hardware cost.

3. The Proposed Compressor and Multipliers

3.1 The Compressor

As shown in Fig. 1, an exact 4-2 compressor generally consists of two full adders with five inputs (x$_{1}$, x$_{2}$, x$_{3}$, x$_{4}$, and cin) and three outputs (sum, carry, and cout) [13]. The number for logic 1 in five inputs is counted by the output according to (1), (2), and (3):

(1)
$sum=x_{1}\oplus x_{2}\oplus x_{3}\oplus x_{4}\oplus c_{in} \\$
(2)
$cout=\left(x_{1}\oplus x_{2}\right)x_{3}+\overline{\left(x_{1}\oplus x_{2}\right)}x_{1} \\$
(3)
$carry=\left(x_{1}\oplus x_{2}\oplus x_{3}\oplus x_{4}\right)c_{in}+\overline{\left(x_{1}\oplus x_{2}\oplus x_{3}\oplus x_{4}\right)}x_{4}$

The four inputs, x$_{1}$, x$_{2}$, x$_{3}$, and x$_{4}$, and the output sum have the same weight, whereas the weights of cout and carry are one binary bit order higher [12,14]. Therefore, cout and carry are delivered to the next module of higher significance.

In this work, the proposed 4-2 compressor (Fig. 2) is derived by modifying the truth table of the exact compressor to obtain simpler logic expressions, as seen in (4) and (5), along with ignoring signals cin and cout for design efficiency, as seen in previous work [2]. Input x$_{1}$ and x$_{2}$ are also omitted to simplify the compressor and greatly reduce the energy and critical path delay further. Thus, it has only OR and XNOR gates. Although the omission of x$_{1}$ and x$_{2}$ introduces certain errors, the proposed compressors are only used for the approximate part in multipliers, which has little impact on computing accuracy. Thus, attention will be paid more to the hardware/accuracy trade-off of the multipliers, rather than only a specific indicator.

(4)
$carry=x_{3}+x_{4}$
(5)
$sum=x_{3}\odot x_{4}$

As seen in the truth table in Table 1, the proposed design has eight erroneous outputs out of 16 outputs. Error is defined as the arithmetic distance between the exact and approximate values [15]. For example, when all inputs are 1, the exact output is 4, and the proposed compressor produces a 1 for both sum and carry. In this case, the decimal output is 3, so the error distance is 1. The maximum error generated by this design is 1 (-1), which could avoid unacceptable results when the compressor is applied to approximate multipliers. Besides, in the structure of a multiplier, error distance with opposite signs of -1 and 1 will counteract each other [5].

Table 1. Truth table of the proposed 4-2 compressor.
 x4 x3 x2 x1 exact carry sum approximate error 0 0 0 0 0 0 1 1 -1 0 0 0 1 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 0 1 1 2 0 1 1 1 0 1 0 0 1 1 0 2 -1 0 1 0 1 2 1 0 2 0 0 1 1 0 2 1 0 2 0 0 1 1 1 3 1 0 2 1 1 0 0 0 1 1 0 2 -1 1 0 0 1 2 1 0 2 0 1 0 1 0 2 1 0 2 0 1 0 1 1 3 1 0 2 1 1 1 0 0 2 1 1 3 -1 1 1 0 1 3 1 1 3 0 1 1 1 0 3 1 1 3 0 1 1 1 1 4 1 1 3 1

3.2 The Approximate Multipliers

To investigate the impact of the proposed compressor on multiplication, 8${\times}$8 Dadda multipliers with various levels of accuracy are designed. The basic structure of the approximate Dadda multiplier was described in [2] where the multiplier uses AND gates to generate all partial products in the first step, and then uses approximate compressors to compress them into, at most, two rows. In the last step, an exact ripple carry adder computes the results.

Table 3. Hardware comparison of 8${\times}$8 multipliers.
 Design Area (${\mu}$m$^{2}$) Power (mW) Delay (ns) PDP (fJ) EDP (fJ∙ns) M753 360.00 4.76×10$^{-2}$ 1.55 73.78 114.36 M744 342.36 4.51×10$^{-2}$ 1.56 70.36 109.76 M735 331.92 4.36×10$^{-2}$ 1.54 67.14 103.40 M726 329.76 4.13×10$^{-2}$ 1.56 64.43 100.51 M717 292.68 3.69×10$^{-2}$ 1.63 60.15 98.04 M663 314.64 4.03×10$^{-2}$ 1.46 58.84 85.90 M654 298.80 3.83×10$^{-2}$ 1.44 55.15 79.42 M645 285.84 3.61×10$^{-2}$ 1.42 51.26 72.79 M636 267.84 3.42×10$^{-2}$ 1.42 48.56 68.96 M627 246.24 3.04×10$^{-2}$ 1.32 40.13 52.97 M618 227.16 2.71×10$^{-2}$ 1.35 36.59 49.39 M573 275.40 3.38×10$^{-2}$ 1.38 46.64 64.37 M564 258.84 3.16×10$^{-2}$ 1.26 39.82 50.17 M555 245.88 2.99×10$^{-2}$ 1.29 38.57 49.76 M546 226.08 2.78×10$^{-2}$ 1.27 35.31 44.84 M537 207.36 2.47×10$^{-2}$ 1.27 31.37 39.84 M528 185.40 2.18×10$^{-2}$ 1.21 26.38 31.92 M519 160.56 1.83×10$^{-2}$ 1.26 23.06 29.05 [2] 389.52 3.73×10$^{-2}$ 1.71 63.78 109.07 [3]1 398.52 3.52×10$^{-2}$ 1.58 55.62 87.87 [3]2 423.36 3.72×10$^{-2}$ 1.85 68.82 127.32 [3]3 420.12 3.36×10$^{-2}$ 1.89 63.50 120.02 [4] 325.44 3.13×10$^{-2}$ 1.52 47.58 72.32 [5] 264.24 2.76×10$^{-2}$ 1.35 37.26 50.30 [6]1 498.96 6.4×10$^{-2}$ 1.66 106.24 176.36 [6]2 510.84 6.9×10$^{-2}$ 1.73 119.37 206.51 [6]3 567.72 7.35×10$^{-2}$ 1.77 130.10 230.27 Exact 577.80 7.81×10$^{-2}$ 1.81 141.36 255.86

As seen from the results in Table 3, M5${\beta}$${\gamma} has the smallest area, power, and delay of the three types of multipliers, whereas M7{\beta}$${\gamma}$ has the highest, and M6${\beta}$${\gamma} is in the middle, as influenced by {\alpha}. Obviously, for each type of multiplier (like M7{\beta}$${\gamma}$), when ${\gamma}$ increases, ${\beta}$ will decrease, and the hardware cost is also reduced by the impact of ${\gamma}$. PDP and EDP are reported to further assess the performance of these multipliers, and they change in the way described above.

Table 4. ER, MED, and NMED of approximate 8${\times}$8 multipliers.
 Design ER (%) MED NMED M753 99.77 1.96×10$^{2}$ 3.01×10$^{-3}$ M744 99.83 1.88×10$^{2}$ 2.89×10$^{-3}$ M735 99.80 1.68×10$^{2}$ 2.58×10$^{-3}$ M726 99.51 1.31×10$^{2}$ 2.01×10$^{-3}$ M717 99.22 1.72×10$^{2}$ 2.65×10$^{-3}$ M663 99.89 3.49×10$^{2}$ 5.36×10$^{-3}$ M654 99.91 3.41×10$^{2}$ 5.25×10$^{-3}$ M645 99.91 3.22×10$^{2}$ 4.95×10$^{-3}$ M636 99.83 2.81×10$^{2}$ 4.33×10$^{-3}$ M627 99.66 2.63×10$^{2}$ 4.04×10$^{-3}$ M618 99.51 4.29×10$^{2}$ 6.60×10$^{-3}$ M573 99.95 6.78×10$^{2}$ 10.42×10$^{-3}$ M564 99.95 6.71×10$^{2}$ 10.32×10$^{-3}$ M555 99.95 6.55×10$^{2}$ 10.08×10$^{-3}$ M546 99.92 6.11×10$^{2}$ 9.40×10$^{-3}$ M537 99.85 5.64×10$^{2}$ 8.67×10$^{-3}$ M528 99.83 4.79×10$^{2}$ 7.36×10$^{-3}$ M519 99.80 8.01×10$^{2}$ 12.33×10$^{-3}$ [2] 99.10 3.15×10$^{3}$ 48.46×10$^{-3}$ [3]1 87.19 3.62×10$^{3}$ 55.73×10$^{-3}$ [3]2 87.19 4.17×10$^{3}$ 64.2×10$^{-3}$ [3]3 97.26 5.91×10$^{3}$ 90.92×10$^{-3}$ [4] 85.73 2.24×10$^{3}$ 34.41×10$^{-3}$ [5] 99.82 4.94×10$^{2}$ 7.60×10$^{-3}$ [6]1 55.34 0.70×10$^{2}$ 1.07×10$^{-3}$ [6]2 17.96 0.17×10$^{2}$ 0.26×10$^{-3}$ [6]3 3.59 0.03×10$^{2}$ 0.04×10$^{-3}$

Compared to previous work, NMED from the proposed multipliers was not the lowest; however, it was acceptable for most image processing applications [17]. M528 had the best accuracy, compared to all designs except [6]. Although the multipliers in [6] have advantages in the accuracy metrics, they carried the highest hardware cost, as shown in Table 3. Therefore, all performance evaluation metrics should be taken into account.

The error distribution of the proposed multipliers, including M7${\beta}$${\gamma}, M6{\beta}$${\gamma}$, and M5${\beta}$${\gamma}, is shown in Fig. 4, where the errors were mainly in the ranges [-600, 600], [\hbox{-}1000, 1000], and [-2000, 1000], respectively, accounting on average for about 83%, 84%, and 84% of the whole range. Thus, the reservation of an appropriate number of the most significant bits will preserve the accuracy of a multiplier. Fig. 4. Error distance from the multipliers: (a) M5{\beta}$${\gamma}$; (b) M6${\beta}$${\gamma}; (c) M7{\beta}$${\gamma}$.

As seen from the results above, M5${\beta}$${\gamma} had the better hardware metrics but a worse NMED, while M7{\beta}$${\gamma}$ had the better NMED and a worse hardware cost. Thus, to reconcile the trade-off between accuracy and hardware cost, a figure of merit (FOM) was suggested in [8]. Due to the relatively small delay from the proposed multiplier, for a fair comparison, delay was removed and modified as seen in (8) [5]:

(8)
$FOM1=PDP\times Area/\left(1-NMED\right)$

Fig. 5 shows FOM1 for the proposed and existing approximate 8${\times}$8 multipliers. The smaller the value of FOM1, the better the trade-off between accuracy and hardware. Thus, M627, M618, M564, M555, M546, M537, M528, and M519 have a lower FOM1 compared with other designs, indicating that most of the proposed multipliers offer a better trade-off than previous designs.

4.3 Image Multiplication

To assess the practicality of approximate multipliers in real applications, they were applied to image multiplication as a widely used operation in image processing. The discussed multipliers handled two images, pixel by pixel, thereby combining two images into a single image [18-21].

The peak signal-to-noise ratio (PSNR) and the mean structural similarity index metric (MSSIM) [22] were computed to evaluate the quality of the processed images. PSNR is expressed in (9):

(9)
$PSNR=10\log _{10}\left(\frac{w\times r\times MAX^{2}}{\sum _{i=0}^{w-1}\sum _{j=0}^{r-1}\left[S'\left(i,j\right)-S\left(i,j\right)\right]^{2}}\right)$

where w and r are the width and height of the image, $\textit{S'(i, j)}$ and S(i, j) represent the exact and approximate value of each pixel, respectively, and MAX is the maximum pixel value. The larger the PSNR, the better the image. MSSIM is expressed in (10):

(10)
$MSSIM\left(X,Y\right)=\frac{1}{k}\sum _{i=1}^{k}\frac{\left(2\mu _{x}\mu _{y}+C_{1}\right)\left(2\sigma _{xy}+C_{2}\right)}{\left(\mu _{x}^{2}+\mu _{y}^{2}+C_{1}\right)\left(\sigma _{x}^{2}+\sigma _{y}^{2}+C_{2}\right)}$

where X and Y represent two images. Other parameters can be found in detail in [22]. MSSIM reaches 1 when the two processed images are the same.

Table 5 shows PSNR and MSSIM values for five image multiplication examples. All the proposed multipliers achieved PSNR values higher than 30dB for various images, with a PSNR higher than 30dB certified as good enough [23]. Besides, the results of MSSIM for all approximate multipliers are very close to the exact design (MSSIM=1). Moreover, both PSNR and MSSIM values increase as the number of exact columns increases.

Table 5. PSNR and MSSIM of multiplied images using the 8${\times}$8 multipliers.
 PSNR (dB) MSSIM Lena× LenaRGB Baboon× BaboonRGB Goldhill× Goldhill Goldhill× LenaRGB Goldhill× BaboonRGB Lena× LenaRGB Baboon× BaboonRGB Goldhill× Goldhill Goldhill× LenaRGB Goldhill× BaboonRGB M753 46.03 45.13 46.20 45.97 45.72 0.9985 0.9989 0.9966 0.9984 0.9980 M744 46.33 45.43 46.50 46.25 46.02 0.9985 0.9990 0.9965 0.9984 0.9980 M735 47.15 46.17 47.26 46.97 46.72 0.9986 0.9990 0.9966 0.9984 0.9980 M726 48.56 48.24 48.46 48.89 48.80 0.9988 0.9992 0.9960 0.9987 0.9983 M717 46.66 47.30 45.68 46.60 46.73 0.9987 0.9990 0.9943 0.9984 0.9980 M663 41.55 40.19 38.99 41.55 41.25 0.9957 0.9967 0.9855 0.9953 0.9943 M654 41.70 40.32 39.08 41.70 41.41 0.9957 0.9968 0.9851 0.9953 0.9943 M645 42.12 40.74 39.44 42.11 41.82 0.9958 0.9968 0.9855 0.9953 0.9943 M636 43.02 41.79 40.33 43.25 43.12 0.9960 0.9971 0.9858 0.9957 0.9947 M627 43.64 42.99 41.64 43.60 43.65 0.9962 0.9972 0.9846 0.9956 0.9944 M618 39.72 39.55 36.71 39.51 39.42 0.9955 0.9964 0.9742 0.9950 0.9929 M573 34.54 34.98 34.36 36.07 35.65 0.9813 0.9896 0.9631 0.9847 0.9827 M564 34.61 35.05 34.39 36.15 35.73 0.9814 0.9896 0.9629 0.9848 0.9827 M555 34.79 35.22 34.43 36.29 35.90 0.9814 0.9896 0.9614 0.9844 0.9822 M546 35.13 35.73 34.91 36.83 36.52 0.9815 0.9897 0.9616 0.9848 0.9826 M537 35.87 36.45 35.50 37.52 37.32 0.9825 0.9900 0.9577 0.9849 0.9823 M528 38.76 37.94 35.27 38.48 38.48 0.9902 0.9913 0.9444 0.9884 0.9848 M519 33.77 33.86 31.04 33.98 34.07 0.9846 0.9872 0.9226 0.9827 0.9778 [2] 22.77 23.44 21.61 24.03 23.68 0.8630 0.8600 0.7214 0.7864 0.7994 [3] 1 13.72 13.85 12.48 13.84 13.67 0.6534 0.7018 0.5411 0.6542 0.6626 [3] 2 13.71 13.85 12.48 13.86 13.68 0.6550 0.7015 0.5416 0.6342 0.6507 [3] 3 14.09 14.19 12.72 14.35 14.16 0.6239 0.6753 0.4938 0.6049 0.6035 [4] 28.17 27.83 25.35 28.59 28.94 0.9367 0.9534 0.9464 0.9533 0.9478 [5] 38.73 39.09 36.70 38.73 38.61 0.9897 0.9916 0.9645 0.9873 0.9827 [6] 1 51.35 52.64 49.11 51.78 51.99 0.9995 0.9997 0.9982 0.9995 0.9994 [6] 2 59.41 59.47 54.20 58.56 58.80 0.9999 0.9999 0.9990 0.9999 0.9998 [6] 3 68.77 68.78 62.52 67.65 67.70 1.0000 1.0000 0.9998 1.0000 1.0000

To visualize the effect of approximate multiplication on image quality, multiplied images LenaRGB and Lena (using the considered multipliers) are shown in Fig. 6. The results indicate no obvious differences between the proposed designs and the exact design.

For comprehensively evaluating the efficiency of the discussed approximate designs in image processing, both hardware cost and image quality should be considered simultaneously, rather than under specific assessment. To intensify the practicability of approximate multipliers, FOM2 is expressed in (11) [24]:

(11)
$FOM2=PDP/\left(MSSIM\times PSNR\right)$
Fig. 6. The multiplied images for LenaRGB and Lena using 8${\times}$8 multipliers.

A smaller FOM2 value indicates a better compromise between hardware efficiency and accuracy. Fig. 5 shows FOM2 from the discussed multipliers when saving space. The results indicate a decreasing trend. Among them, M627, M618, M537, M528, and M519 provided a better FOM2 than the other designs. Specifically, FOM2 for M528 takes first place in this regard, with a 63% reduction, on average, compared to the existing designs, followed by M519 and M537.

5. Conclusion

In this work, an ultra-efficient approximate 4-2 compressor was proposed by introducing symmetrical errors into the truth table of the exact compressor. A set of Dadda multipliers, denoted as M${\alpha}$${\beta}$${\gamma}$, was designed to investigate the hardware/accuracy trade-off. Image multiplication was considered as an example to evaluate computing accuracy. Experimental results showed that the accuracy of a multiplier is mainly dominated by the exact part, while the hardware cost is affected by the approximate and truncated parts. Furthermore, two figures of merit show that a comprehensive indicator should be considered to reach a compromise between hardware and accuracy, because a multiplier having high accuracy will consume high amounts of energy. In addition, several proposed multipliers surpassed their counterparts under the considered criteria.

ACKNOWLEDGMENTS

This work was supported by the Fundamental Research Funds for the Central Universities of China (Grant No. JZ2020HGQA0162, Grant No. JZ2020HGTA0085).

REFERENCES

1
Angizi S., Jiang H., DeMara R. F., Han J., Fan D., 2018, Majority-Based Spin-CMOS Primitives for Approximate Computing, IEEE Transactions on Nanotechnology, Vol. 17, No. 4, pp. 795-806
2
Momeni A., Han J., Montuschi P., Lombardi F., 2015, Design and Analysis of Approximate Compressors for Multiplication, IEEE Transactions on Computers, Vol. 64, No. 4, pp. 984-994
3
Gorantla A., P D., 2017., Design of Approximate Compressors for Multiplication, ACM J. Emerg. Technol. Comput. Syst., Vol. 13, No. 3, pp. article 44
4
Venkatachalam S., Ko S., 2017, Design of Power and Area Efficient Approximate Multipliers, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 25, No. 5, pp. 1782-1786
5
Sabetzadeh F., Moaiyeri M., Ahmadinejad M., 2019, A Majority-Based Imprecise Multiplier for Ultra-Efficient Approximate Image Multiplication, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 66, No. 11, pp. 4200-4208
6
Strollo A., Napoli E., Caro D., Petra N., Meo G., 2020, Comparison and Extension of Approximate 4-2 Compressors for Low-Power Approximate Multipliers, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 67, No. 9, pp. 3021-3034
7
Esposito D., Strollo A. G. M., Napoli E., Caro D. D., Petra N., 2018, Approximate Multipliers Based on New Approximate Compressors, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 65, No. 12, pp. 4169-4182
8
Akbari O., Kamal M., Afzali-Kusha A., Pedram M., 2017, Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 25, No. 4, pp. 1352-1361
9
Ahmadinejad M., Moaiyeri M. H., Sabetzadeh F., 2019, Energy and area efficient imprecise compressors for approximate multiplication at nanoscale, (in English), Aeu-International Journal of Electronics and Communications, Vol. 110
10
Salmanpour F., Moaiyeri M. H., Sabetzadeh F., 2021, Ultra-Compact Imprecise 4:2 Compressor and Multiplier Circuits for Approximate Computing in Deep Nanoscale, Circuits Systems and Signal Processing
11
Ha M., Lee S., Mar 2018, Multipliers With Approximate 4-2 Compressors and Error Recovery Modules, IEEE Embedded Systems Letters, Vol. 10, No. 1, pp. 6-9
12
Pei H., Yi X., Zhou H., He Y., Jan 2021, Design of Ultra-Low Power Consumption Approximate 4-2 Compressors Based on the Compensation Characteristic, IEEE Transactions on Circuits and Systems II-Express Briefs, Vol. 68, No. 1, pp. 461-465
13
Chiphong C., Jiangmin G., Mingyan Z., 2004, Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits, IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 51, No. 10, pp. 1985-1997
14
Yi X., Pei H., Zhang Z., Zhou H., He Y., 2019, Design of an Energy-Efficient Approximate Compressor for Error-Resilient Multiplications, in 2019 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-5
15
Liang J., Han J., Lombardi F., 2013, New Metrics for the Reliability of Approximate and Probabilistic Adders, IEEE Transactions on Computers, Vol. 62, No. 9, pp. 1760-1771
16
Guo W., Li S., 2021, Fast Binary Counters and Compressors Generated by Sorting Network, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 29, No. 6, pp. 1220-1230
17
Jiang H., Santiago F. J. H., Mo H., Liu L., Han J., 2020, Approximate Arithmetic Circuits: A Survey, Characterization, Recent Applications, Proceedings of the IEEE, Vol. 108, No. 12, pp. 2108-2135
18
Strollo A. G. M., Caro D. D., Napoli E., Petra N., Meo G. D., 2020, Low-Power Approximate Multiplier with Error Recovery using a New Approximate 4-2 Compressor, in 2020 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1-4
19
Toan N. V., Lee J., 2019, Energy-Area-Efficient Approximate Multipliers for Error-Tolerant Applications on FPGAs, in 2019 32nd IEEE International System-on-Chip Conference (SOCC), pp. 336-341
20
Savithaa N., Poornima A., 2019, A High speed Area Efficient Compression technique of Dadda multiplier for Image Blending Application, in 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), pp. 426-430
21
Savio M. M. D., Deepa T., 2020, Design of Higher Order Multiplier with Approximate Compressor, in 2020 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), pp. 1-6
22
Zhou W., Bovik A. C., Sheikh H. R., Simoncelli E. P., 2004, Image quality assessment: from error visibility to structural similarity, IEEE Transactions on Image Processing, Vol. 13, No. 4, pp. 600-612
23
Ansari M. S., Jiang H., Cockburn B. F., Han J., 2018, Low-Power Approximate Multipliers Using Encoded Partial Products and Approximate Compressors, IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 8, No. 3, pp. 404-416
24
Ahmadinejad M., Moaiyeri M. H., 2021, Energy- and Quality-Efficient Approximate Multipliers for Neural Network and Image Processing Applications, IEEE Transactions on Emerging Topics in Computing, pp. 1-1

Author

Yongqiang Zhang

Yongqiang Zhang received the B.S. degree in electronic science and technology from Anhui Jianzhu University, Hefei, China, in 2013, and the Ph.D. degree in integrated circuits and systems from the Hefei University of Technology, Hefei, in 2018. He was a Visiting Student with the Department of Electrical and Computer Engineering, University of Alberta, for one year. He is currently with the School of Microelectronics, Hefei University of Technology. His research interests include approximate computing, stochastic computing, VLSI design, and nanoelectronics circuits and systems.

Cong He

Cong He received her B.S. degree in Electronic Information and Engi-neering from Anhui Jianzhu University, Hefei, China, in 2019. She is currently pursuing the M.S. degree in Micro-electronics with the Hefei University of Technology. Her research interests include approximate computing, and emerging technologies in computing systerms.

Xiaoyue Chen

Xiaoyue Chen received her B.S. degree in Electronic and Information Engineering from the Liaoning University of Engineering and Technology, Huludao, China, in 2021. She is currently pursuing the M.S. degree in Microelectronics with the Hefei University of Technology. Her research interests include approximate computing and stochastic computing.

Guangjun Xie

Guangjun Xie received the B.S. degree and M.S. degrees in microelectronics from the Hefei University of Technology, Hefei, China, in 1992 and 1995, respectively, and the Ph.D. degree in signal and information processing from the University of Science and Technology of China, Hefei, in 2002. He worked as a Post-Doctoral Researcher in optics with the University of Science and Technology of China from 2003 to 2005. He was a Senior Visitor with IMEC in 2007 and ASIC in 2011. He is currently a Professor with the School of Microelectronics, Hefei University of Technology. His research interests include integrated circuit design and nanoelectronics. Dr. Xie is a Senior Member of the Chinese Institute of Electronics.