• Tiada Hasil Ditemukan

CTHS Based Energy Efficient Thermal Aware Image ALU Design on FPGA

N/A
N/A
Protected

Academic year: 2022

Share "CTHS Based Energy Efficient Thermal Aware Image ALU Design on FPGA"

Copied!
26
0
0
Tunjuk Lagi ( halaman)

Tekspenuh

(1)

CTHS Based Energy Efficient Thermal Aware Image ALU Design on FPGA

Tanesh Kumar1Bishwajeet Pandey2 S. H. A. Mussavi1Noor Zaman3

Published online: 23 June 2015

Springer Science+Business Media New York 2015

Abstract Image ALU is a special type of ALU exclusively designed to perform arith- metic and logical operation on Image only. This Image ALU design is able to perform 14 operations. In this work, we have proposed a novel 4-stages energy efficient CTHS (C- Capacitance Scaling, T-Thermal Scaling, H-HSTL I/O Standard, S-SSTL I/O Standard) approach for Low Power and Thermal Aware Image ALU Design. CTHS technique is achieving 81.79 % reduction in power consumption which is more than the power reduction by method discussed in Shrivastava et al. (IEEE Trans Very Large Scale Integr Syst 18(6):988–997,2010); Yoonjin and Mahapatra (IEEE Trans Very Large Scale Integr Syst 18(1):15–28, 2010); Chatterjee and Sachdev (IEEE Trans Very Large Scale Integr Syst 13(11):1296–1304,2005); Wijeratne et al. (IEEE J Solid State Circuits 42(1):26–37, 2007); Nehru et al. (International conference on advances in engineering, science and management pp 145–149,2012); Ho et al. (IEEE international symposium on circuits and systems pp 353–356, 2013); Rani et al. (3rd in international conference on electronics computer technology pp 224–228, 2011) for ALU. There is 38.63 % reduction in I/O Power and 46.42 % reduction in leakage power, when we scale down capacitance from 50 to 5 pF on 28 nm technology based Kintex-7 FPGA on 100 GHz device operating fre- quency. FPGA is a Filed Programmable Gate Array. There is 67.05 % reduction in I/O Power when we scale down ambient temperature from 50 to 10C on 100 GHz frequency.

& Tanesh Kumar

tanesh.nust@gmail.com Bishwajeet Pandey gyancity@gyancity.com S. H. A. Mussavi dean@indus.edu.pk Noor Zaman nzaman@kfu.edu.sa

1 Indus University, Karachi, Pakistan

2 School of Electrical and Electronics Engineering, Chitkara University, Chandigarh, India

3

DOI 10.1007/s11277-015-2801-8

(2)

There are 5 different climates in koppen climate classification. We are taking 5 different values in order to nearly represent 5 climates. Using high profile Heat Sink and 500 LFM Airflow, there is 75.39 % leakage power reduction from the last optimized result of capacitance scaling and 85.84 % leakage power reduction from the initial power dissi- pation. On 3rd stage, using HSTL I/O Standard, there is 64.53 % power reduction from the initial power dissipation. There is 41.06, 59.26, 78.75 % power reduction from HSTL_II_DCI_18 to HSTL_I_12 on 100, 10 and 1 GHz. On 4th and final stage, using SSTL I/O Standard, there is 81.79 % power reduction from the initial power dissipation.

There is 61.83 % reduction in junction temperature, when we apply 500 LFM airflow and high profile heat sink in compare to 250 LFM airflow and no heat sink. LFM is an acronym for Linear Feet per Minute. LFM is a unit of airflow that help us to control junction temperature of FPGA. Unit of leakage power is Watt (W) and Junction Temperature is degree Celsius (C).

Keywords Low power designEnergy efficiencyCapacitance scalingThermal analysisHSTLSSTL IO StandardReal time image processing FPGA

1 Introduction

CTHS is a novel approach to design an energy efficient Image ALU for green digital image processor as shown in Fig.1. It consist of four different methods, where C stands for Capacitance scaling, T stands for Thermal approach, H stands for high speed transceiver logic (HSTL) Input/output standards [24] and S stands for stub series tran- sistor logic (SSTL) I/O standard [24] as shown in Fig.1. In first stage, Capacitance scaling is applied on target design, which is effective to reduce I/O power of design. In second stage, Thermal Approach is applied on target design. Thermal approach consists

CTHS : An approach for Energy

Efficiency

Capacitance Scaling

SSTL I/O Standard

HSTL I/O Standarrd Thermal

Approach Fig. 1 Different stages of

energy efficeint CTHS approach

(3)

of application of thermal scaling, selection of hit sink and set airflow in term of LFM. It is effective to reduce leakage power of design. I/O standard, whose primary purpose to match impedance and avoid transmission line reflection, is also used here to reduce I/O power of this design. On 3rd Stage, we apply HSTL [24] and then at final stage, SSTL [24] energy efficient I/O Standard for further reduction of I/O power in order to make our design, the most energy efficient one. In this work, using CTHS approach, we achieve 81.79 % power reduction from the initial power dissipation in Kintex-7 FPGA on 100 GHz.

The range of capacitance is 50–10 pF in step size of 10 pF. Last one is 5 pF, which is default in Xilinx architecture. The range of ambient temperature is 50–10C in step size of 10. In [1], there is 59 % power reduction of functional unit using Leakage Aware power gating. In [2], there is 39.72 % power reduction achieved in reconfigurable ALU using the coarse-grained reconfigurable architectures. In [3], there is 22 % power reduction in 32-bit testable ALU design when using 180-nm bulk CMOS technology. In [4], there is 42 % power reduction in integer execution core of Pentium 4, when we use 65-nm CMOS technology. In [5], 70 % reduction in power dissipation is achieved in ALU using the Novel 8 Transistor based adder and multiplexers with Pass transistor logic. In [6], 51 % energy efficiency achieved when SLTI approach is used for asyn- chronous ALU. In [7], 70 % power reduction is achieved using low power 10T 1-bit full adder. In [8], 70 % power reduction is achieved when the SLOP Method is used for ALU of 32-bit MIPS.

Figure2shows comparison between our proposed CTHS techniques and other 7 dif- ferent energy efficient techniques mention in [1–7].

Image ALU is a special type of ALU exclusively designed for Image Specific opera- tions. It has two input signals i.e. (two images). Others input are clock (CLK) and selection line (SEL) as shown in Figs.3and4. Figure4shows RTL schematics of ALU. Selection line is of 4 bit which is used to select 16 (i.e. 24) operations of Image ALU. Processed image is output which we obtain after certain operation of ALU. According to [1], power saving of functional units (FUs) is current research area for high-end superscalar proces- sors because FUs consume significant processor energy, and they are integral part in processor. In [1], Leakage Aware Power Gating and Leakage Aware Operation to FU Binding methodology save 34 and 59 % ALU power of the ALPHA 21364 respectively. In [2], CGRA is an acronym for Coarse Grained Reconfigurable Architectures. CGRAs consist of reprogrammable arrays of ALU and cache for configuration purpose to get high

Fig. 2 Comparison of our proposed method with existing methods

(4)

performance and flexibility. This energy efficient design in [2] doesn’t affect performance and flexibility of CGRA. In [2], the proposed approach reduces 39.72 % power in con- figuration cache with 2.16 % area overhead.

In [3], 32-bit ALU permits low-power operation and delay-fault testability with design- for-test (DFT) scheme. The power optimized methods permits for 18 % power saving in ALU for 180 nm CMOS. Along with, 22 % saving in standby mode static power and 23 % less peak current is possible. The integer execution core of Pentium 4 operates at 9 GHz in 65-nm CMOS technology is discussed in [4]. It results in 8.4 % saving in integer core normalized dynamic power and 42 % reduction in normalized static power. In [5], an ALU using Novel 8 Transistor based full adder and multiplexers with Pass transistor logic is proposed. The power and the area of ALU are 70 % less than existing method. In [6] the asynchronous ALU based on SLTI approach reduce *51 and *44 % power than the reported PCHB on the arithmetic and logic operations respectively. In [7], an energy efficient 1-bit full adder with 10T is used in ALU. There is 70 % power and area reduction

Fig. 3 Top level schematic of image ALU

Fig. 4 Initial 3 stages of RTL schematic of image arithmetic logic units

(5)

in compare to conventional design. So, the design is area and energy efficient. In [8], power control signals are generated for different units, and the ALU is powered down.

Pipelined 32-bit MIPS processor in simulation shows that the SLOP method saves 70 % power. After analysis of [9–23] in perspective of image operation, we finalize the all 16 operations of this Image ALU design from the pool of available large number of arithmetic and logical image operation. In [25], the specification and characteristics of different I/O standard (HSTL, SSTL, LVCMOS, and LVDCI) available on Kintex-7, Virtex-7 and Artix-7 FPGA is discussed. Ref. [25] mainly deals with selecting criteria of I/O resource available in FPGA. In [26], LVCMOS I/O standard based low power ALU is designed with appropriate application of drive strength and output driver supply voltage. In [27], the clock gated ALU is designed, in which we turn off the rest 15 modules which is not in use when any one instruction is executing. In [28], mapping based design is under consideration for low power design. Here, we mapping clock enable signal to either clock port or LUT. In both case, power requirement of device is different whereas their functionality is same. In [29], power intents of different archi- tecture are described. In [30], four different I/O standards (SSTL, HSTL, LVCMOS, and DCI) are applied on ALU to achieve energy efficiency. Mobile DDR IO standards are used for energy efficient portable ALU design on FPGA. In this work, we are working with different IO standards than Mobile DDR [31] for energy efficient design.

Waveform of image ALU shown in Figure 5 represents initial state of ALU, i.e. state when we are waiting for the first positive edge of clock. That’s why processed image is not set and is showing 0000000000000000.

A. InitialWaveform

B. Addition: The Opcode of Addition is 0000 as shown in Fig.6. Inputs are two 16-bit binary images namely Image 1 and Image 2. Output image i.e. Processed_Image is of similar size as input image.

C. Subtraction: The Opcode of Subtraction is 0001 as shown in Fig.7. Inputs are two 16-bit binary images namely Image 1 and Image 2. Subtraction operation in Image Processing is used to detect changes between two images.

D. Left Shift by 1(Image 1): The Opcode of Left Shift Image 1 is 0010 as shown in Fig.8. Input is one 16-bit binary image Image 1. Left shift operation in Image Processing is used to increases the image contrast, like the pixel multiplication.

E. Left Shift by 1(Image2): The Opcode of Left Shift Image 1 is 0011 as shown in Fig.9. Input is one 16-bit binary image Image 2. Just like above operations Left shift operation for Image 2 is also used to increases the image contrast of Image 2.

F. OR: The Opcode of OR is 0100 as shown in Fig.10. Inputs are two 16-bit binary images Image 1 and Image 2. OR logical operation in Image Processing is also used to compute the union of the images.

Fig. 5 Initial waveform of image ALU

(6)

Fig. 6 Waveform of image addition using image ALU

Fig. 7 Waveform of image subtraction using image ALU

Fig. 8 Waveform of left shift image 1 using image ALU

Fig. 9 Waveform of left shift image 2 using image ALU

Fig. 10 Waveform of OR logical operations using image ALU

(7)

G. AND: The Opcode of AND is 0101 as shown in Fig.11. Inputs are two 16-bit binary image Image 1 and Image 2. AND logical operation in Image Processing is also used to compute the intersection of the images.

H. Complement Image 1: The Opcode of Complement Image 1 is 1000 as shown in Fig.12. Input is one 16-bit binary image Image 1. Complement of Image help us produces photographic negative of Image.

I. Complement Image 2: The Opcode of Complement Image 2 is 1001 as shown in Fig.13. Input is one 16-bit binary image Image 2. Complement Image 2 is used to produces photographic negative of Image 2.

J. Division: The Opcode of Division is 1010 as shown in Fig.14. Inputs are two 16-bit binary images namely Image 1 and Image 2. Division in Image Processing is also used

Fig. 11 Waveform of AND logical operation using image ALU

Fig. 12 Waveform of complement image 1 using image ALU

Fig. 13 Waveform of complement image 2 using image ALU

Fig. 14 Waveform of image divison using image ALU

(8)

to compute the fractional change or ratio between corresponding pixel values (hence the common alternative name of rationing).

K. NOR: The Opcode of NOR is 1011. Inputs are two 16-bit binary images. Image 1 and Image 2 are shown in Fig.15. NOR in Image Processing is also used to compute the inverse of union of the images.

L. NAND: The Opcode of NAND is 1100 as shown in Fig.16. Inputs are two 16-bit binary images namely Image 1 and Image 2. NAND logical operation in Image Processing is also used to compute the inverse of intersection of the images.

M. Power Transformation on Image 1: The Opcode of Power Transformation of Image 1 is 1101 as shown in Fig.17. Input is one 16-bit binary image Image 1. It decreases the contrast of Image whencis greater than 1. It increases the contrast of Image whencis less than 1.

N. Power Transformation on Image 2: The Opcode of Power Transformation of Image 2 is 1110 as shown in Fig.18. Input is one 16-bit binary image Image 2. It decreases the contrast of Image whencis greater than 1. Power transformation of image 2 increases the contrast of Image 2 whencis less than 1.

O. Multiplication: The Opcode of Multiplication is 1111 as shown in Fig.19. Inputs are two 16-bit binary images. These are: Image 1 and Image 2. Multiplication in Image Processing produces an output image in which the pixel values are just those of the Image 1, multiplied by the values of the corresponding values in the Image 2.

Fig. 15 Waveform of NOR logical operation using image ALU

Fig. 16 Waveform of NAND logical operation using image ALU

Fig. 17 Waveform of power transformation on image 1 using image ALU

(9)

2 Results and Discussions with Capacitance Scaling

Capacitance of a device is directly proportional to the power. The output load is a sum of capacitance of pin and capacitance of device. It is measured in pico Farad (pF). We are taking 6 different output load under consideration. These are 50, 40, 30, 20, 10 and 5 pF as shown in Fig.20.

Initially capacitance is 50 pF, and then we are scaling it to 40 pF and so on. Since, power is directly proportional to capacitance; therefore, power consumption is maximum with 50 pF and minimum with 5 pF.

2.1 Capacitance is Initially 50 pF

Table1 shows power dissipation and junction temperature of Image ALU for 1, 10 and 100 GHz device operating frequency, when capacitance is 50 pF. We start capacitance scaling here, and try to reduce power dissipation of overall circuit.

2.2 Capacitance Reduces to 40 pF

There is 1.45, 5.94, 8.59 % power reduction when we scale down capacitance from 50 to 40 pF on 1, 10 and 100 GHz respectively. In Table2, the power dissipation and junction temperature of Image ALU with 40 pF output load is shown.

Fig. 18 Waveform of power transformation on image 2 using image ALU

Fig. 19 Waveform of Multiplication of images using image ALU

50pF 40pF 30pF 20pF 10pF 5pF Fig. 20 Capacitance scaling

from 50 to 5 pF

(10)

2.3 Capacitance Reduces to 30 pF

There is 2.9, 6.32, 17.17 % power reduction when we scale down capacitance from 50 to 30 pF on 1, 10 and 100 GHz respectively. In Table3, the power dissipation and junction temperature of Image ALU with 30 pF is shown.

2.4 Capacitance Reduces to 20 pF

There is 4.41, 17.81, 25.76 % power reduction when we scale down capacitance from 50 to 20 pF on 1, 10 and 100 GHz respectively. In Table4, the power dissipation and junction temperature of Image ALU with 20 pF is shown.

2.5 Capacitance Reduces to 10 pF

There is 5.86, 23.75, 34.34 % power reduction when we scale down capacitance from 50 to 10 pF on 1, 10 and 100 GHz respectively. In Table5, the power dissipation and junction temperature of Image ALU with 10 pF is shown.

Table 1 Power dissipation when capacitance is 50 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.517 0.132 53.5

10 3.717 0.151 58.8

100 25.722 0.657 109

Table 2 Power dissipation when capacitance is 20 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.495 0.132 53.5

10 3.496 0.149 58.3

100 23.513 0.573 104.3

Table 3 Power dissipation when capacitance is 30 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.473 0.132 53.4

10 3.275 0.147 57.8

100 21.305 0.499 99.6

Table 4 Power dissipation when capacitance is 20 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.450 0.132 53.4

10 3.055 0.146 57.8

100 19.096 0.434 99.6

(11)

2.6 Capacitance Reduces to 5 pF

There is 6.59, 26.74, 38.63 % power reduction when we scale down capacitance from 50 to 5 pF on 1, 10 and 100 GHz respectively. In Table6, the power dissipation by Image ALU with 5 pF is shown.

2.7 Effect of Capacitance Scaling and 1 GHz Operating Frequency

With change in capacitance, there is no change in static power but there is significant reduction in IO power consumption of Image ALU.

There is 6.59 % reduction in I/O Power when we scale down capacitance from 50 to 5 pF in Kintex-7 FPGA on 1 GHz as shown in Table7and Fig.21. IO power is more than leakage power as shown in Fig.21.

2.8 Effect of Capacitance Scaling and 10 GHz Operating Frequency

With change in capacitance, there is minor change in leakage power but there is significant saving in IO power consumption of Image ALU as shown in Fig.22 and Table8. In compare to previous work, we are increasing 10 times operating frequency of Image ALU design from 1 to 10 GHz.

There is 26.47 % reduction in I/O Power and 4.63 % reduction in leakage power when we scale down capacitance from 50 to 5 pF in Kintex-7 FPGA on 10 GHz.

Table 5 Power dissipation when capacitance is 10 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.428 0.132 53.3

10 2.834 0.144 56.9

100 16.888 0.378 90.3

Table 6 Power dissipation when capacitance is 5 pF

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.132 53.3

10 2.723 0.144 56.7

100 15.784 0.378 88.0

Table 7 Power and junction temperature on 1 GHz

50 (pF) 40 (pF) 30 (pF) 20 (pF) 5 (pF)

IO power 1.517 1.495 1.473 1.450 1.417

Leakage power 0.132 0.132 0.132 0.132 0.132

Junction temperature 53.5 53.5 53.4 53.4 53.3

(12)

2.9 Effect of Capacitance Scaling and 100 GHz operating Frequency

With change in capacitance, there is minor change in leakage power but there is significant reduction in IO power consumption of Image ALU.

There is 38.63 % reduction in I/O Power and 46.42 % reduction in leakage power when we scale down capacitance from 50 to 5 pF in Kintex-7 FPGA on 100 GHz as shown in Fig.23and Table9.

3 Thermal Approach for Power Reduction

This thermal approach for energy efficient design is applied in following three stages as shown in Fig.24.

Temperature Scaling, selection of appropriate Hit Sink, and selection of appropriate airflow are 3 stages which is applied one by one in order to reduce the overall power consumption. There are 5 different climates in koppen climate classification. We are taking 5 different values in order to nearly represent 5 climates. First one is 50C. Last one is 10C. Rest are 40, 30 and 20C.

IO Power Leakage Power Fig. 21 Effect of capacitance &

1 GHz frequency on power dissipation of image ALU

IO Power Leakage Power Fig. 22 Effect of capacitance &

10 GHz frequency on power dissipation of image ALU

Table 8 Power and junction temperature on 10 GHz

50 (pF) 40 (pF) 30 (pF) 20 (pF) 5 (pF)

IO power 3.717 3.496 3.275 3.055 2.723

Leakage power 0.151 0.149 0.147 0.146 0.144

Junction temperature 58.8 58.3 57.8 57.4 56.7

(13)

3.1 Ambient Temperature is Initially 50C

Ambient Temperature is room temperature. According to World meteorological organi- zation, Asian continent has record of 54C, the highest recorded temperature in Tirat, Israel in 1942. We are considering this extreme case also by taking ambient temperature as 50C. In Table10, IOs Power, leakage power and junction temperature is shown when ambient temperature is 50C.

3.2 Ambient Temperature Reduce to 40C

Our lab is stationed in Delhi, India. Delhi has highest average temperature in summer is approx 40. we are considering this case also. Power and Junction temperature, is shown in Table11when ambient temperature is 40C.

IO Power Leakage Power Fig. 23 Effect of capacitance &

100 GHz frequency on power dissipation of image ALU

Table 9 Power and junction temperature on 100 GHz

50 (pF) 40 (pF) 30 (pF) 20 (pF) 5 (pF)

IO power 25.722 23.513 21.305 19.096 15.784

Leakage power 0.657 0.573 0.499 0.434 0.352

Junction temperature 109 104.3 99.6 95.0 90.3

Temperature Scaling

•Inially 50 C

•Then Reduce to 40 C

•Then Reduce to 30 C

•Then Reduce to 20 C

•Then reduce To 10 C

Hit Sink Selecon

•None

•Custom

•Low Profile

•Medium Profile

•High Profile

Airflow

•250 LFM

•500 LFM

Fig. 24 Thermal approach for power reduction

(14)

3.3 Ambient Temperature Reduce to 30C

Power and Junction temperature, is shown in Table12, when ambient temperature is 30C. Both power and temperature is maximum when device operating frequency is 1 THz and it is minimum with 1 GHz.

3.4 Ambient Temperature Reduces to 20 C

Power and Junction temperature, is shown in Table13, when ambient temperature is 20C. The 20C temperature is an average temperature in Delhi (where our lab is situated).

3.5 Ambient Temperature Reduces to 10 C

Power and Junction temperature, is shown in Table14, when ambient temperature is 10C. Ambient temperature is 10C in winter nearby our lab.

3.6 Effect of Ambient Temperature Scaling on 1 GHz

With change in ambient temperature, there is no change in IO power but there is significant reduction in leakage power consumption of Image ALU as shown in Table15and Fig.25.

The unit of junction Temperature is degree Celsius (C).

There is 40.90 % reduction in leakage power, when we scale down ambient temperature from 50 to 10C in Kintex-7 FPGA on 1 GHz.

Table 10 Power and junction temperature on 50C

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.132 53.3

10 2.723 0.144 56.7

100 15.784 0.378 88.0

Table 11 Power and junction temperature on 40C

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.106 43.3

10 2.723 0.113 46.6

100 15.784 0.259 77.8

Table 12 Power and junction temperature on 30C

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.089 33.2

10 2.723 0.094 36.6

100 15.784 0.193 67.7

(15)

3.7 Effect of Ambient Temperature Scaling on 10 GHz

With change in ambient temperature, there is no change in IO power but there is significant reduction in leakage power consumption of image ALU.

There is 49.65 % reduction in leakage power, when we scale down ambient temperature from 50 to 10C in Kintex-7 FPGA on 10 GHz as shown in Table16and Fig.26.

Table 13 Power and junction temperature on 20C

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.078 23.2

10 2.723 0.081 26.6

100 15.784 0.147 57.6

Table 14 Power and junction temperature on 50C

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.070 13.2

10 2.723 0.072 16.6

100 15.784 0.116 47.5

Table 15 Power and junction temperature on 1 GHz

50 (C) 40 (C) 30 (C) 20 (C) 10 (C)

IO power 1.417 1.417 1.417 1.417 1.417

Leakage power 0.132 0.106 0.089 0.078 0.078

Junction temperature 53.3 43.3 33.2 23.2 13.2

Fig. 25 Thermal approach for power reduction on 1 GHz

(16)

3.8 Effect of Ambient Temperature Scaling on 100 GHz

With change in ambient temperature, there is no change in IO power but there is significant reduction in leakage power consumption of Image ALU as shown in Fig.27and Table17.

There is 67.05 % reduction in leakage power when we scale down ambient temperature from 50 to 10C in Kintex-7 FPGA on 100 GHz.

3.9 Effect of Hit Sink on Overall Power for 100 GHz

There is 7.76 % reduction in leakage power with High profile heat sink in compare to the power consumption on 100 GHz with medium Profile Hit Sink as shown in Table18. Unit of leakage power is Watt (W) and Junction Temperature is degree Celsius (C).

3.10 Effect of Airflow on Overall Power

When we increase the airflow, there is further reduction of 13.09 % leakage power con- sumption on 500 LFM. There is 19.32 % power reduction from the last optimized result of temperature scaling as shown in Table19and Fig.28.

There is 75.39 % leakage power reduction from the last optimized result of capacitance scaling. There is 85.84 % leakage power reduction from the initial power dissipation.

When there is no use of Heat Sink, then junction temperature is 95.1 C, which reduce to 43.8 C, when we use high profile heat sink for 250 LFM. When there is no use of Heat

Table 16 Power and junction temperature on 10 GHz

50 (C) 40 (C) 30 (C) 20 (C) 10 (C)

IO power 2.723 2.723 2.723 2.723 2.723

Leakage power 0.143 0.113 0.094 0.081 0.072

Junction temperature 56.7 46.6 36.6 26.6 16.6

Fig. 26 Thermal approach for power reduction on 10 GHz

(17)

Sink, then junction temperature is 821 C, which reduces to 36.3 C, when we use high profile heat sink for 500 LFM.

Therefore, there is 61.83 % reduction in junction temperature when we apply 500 LFM airflow and high profile heat sink in compare to 250 LFM airflow and no heat sink as shown in Fig.29and Table20.

4 HSTL IO Standard for Power Reduction

HSTL is a technology-independent standard for signaling between ICs. The nominal sig- naling range is 1.2–2.5 V. On 3rd Stage, using HSTL I/O Standard, there is 64.53 % power reduction from the initial power dissipation.

4.1 HSTL is Initially HSTL_II_DCI_18

The I/O standard HSTL_II_DCI_18, is a DCI variant of HSTL class II using 1.8 V. With high profile Hit Sink, HSTL_II_DCI_18 IO Standard and 500 LFM Airflow, the power consumption are shown in Table21.

4.2 Migrate to HSTL_I_DCI for Energy Efficiency

The I/O standard HSTL_I_DCI, is a DCI variant of HSTL class II. It is more power efficient than HSTL_II_DCI_18. The power disispation and junction temperature with HSTL_I_DCI is shown in Table22.

Fig. 27 Thermal approach for power reduction on 100 GHz

Table 17 Power and junction temperature on 10 GHz

50 (C) 40 (C) 30 (C) 20 (C) 10 (C)

IO power 15.784 15.784 15.784 15.784 15.784

Leakage power 0.352 0.259 0.193 0.147 0.116

Junction temperature 88 77.8 67.7 57.6 47.5

(18)

Table 18 Power and junction temperature on 250 LFM

None Custom Low profile Medium profile High profile

Leakage power 0.436 0.123 0.127 0.116 0.107

Junction temperature 95.1 50.3 51.8 47.5 43.8

Table 19 Power reduction with

increase in airflow 250 (LFM) 500 (LFM)

None 0.436 0.294

Custom 0.123 0.115

Low profile 0.127 0.107

Medium profile 0.116 0.098

High profile 0.107 0.093

250LFM 500LFM Fig. 28 Air flow and hit sink for

power reduction

Temperature Versus Heat Sink

250LFM 500LFM Fig. 29 Air flow and heat sink

for junction temperature reduction

Table 20 Reduction in junction temperature with increase in airflow

250 (LFM) 500 (LFM)

None 95.1 82.1

Custom 50.3 47.2

Low profile 51.8 43.8

Medium profile 47.5 39.1

High profile 43.8 36.3

(19)

Table 21 Power dissipation with HSTL_II_DCI_18

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 1.417 0.070 12.2

10 2.723 0.071 14.6

100 15.784 0.093 36.3

Table 22 Power dissipation with HSTL_I_DCI

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.775 0.069 11.3

10 1.868 0.070 13.4

100 12.798 0.087 32.0

Table 23 Power dissipation with HSTL_II_18

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.321 0.069 10.6

10 1.389 0.070 12.6

100 12.060 0.098 39.4

Table 24 Power dissipation with HSTL_II_DCI

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.980 0.069 11.6

10 1.761 0.070 13.2

100 9.575 0.081 27.4

Table 25 Power dissipation with HSTL_I_12

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.248 0.068 10.5

10 1.069 0.069 12.2

100 9.276 0.081 27.1

Table 26 Power comparison

with HSTL GHz HSTL_II_DCI_18 (W) HSTL_I_12 (W)

1 1.417 0.248

10 2.723 1.069

100 15.784 9.276

(20)

4.3 Migrate to HSTL_II_18 for Energy Efficiency

The I/O standard HSTL_II_18, is a 1.8 V variant of HSTL class II. The power disispation and junction temperature with HSTL_II_18 is shown in Table23. It is more power effi- cient than HSTL_I_DCI.

4.4 Migrate to HSTL_II_DCI for Energy Efficiency

HSTL_II_DCI and HSTL_II_DCI_18 gives on chip thevenin termination powered from Output Driver Core Supply Voltage (VCCO). It creates an equivalent termination voltage of VCCO/2, and used in bidirectional links. The power disispation and junction temperature with HSTL_II_DCI is shown in Table24.

4.5 Migrate to HSTL_I_12 for Energy Efficiency

The I/O standard HSTL_I_12, is a 1.2 V variant of HSTL class I. It is more power efficient than HSTL_II_DCI. It provide the maximum reduction in power dissipation. The power disispation and junction temperature with HSTL_I_12 is shown in Table25.

4.6 Effect of HSTL on Power Dissipation Image ALU Design

There is 41.06, 59.26, 78.75 % power reduction from HSTL_II_DCI_18 to HSTL_I_12 on 100, 10 and 1 GHz as shown in Table26and Fig.30.

There is significant reduction in IOs power with HSTL but there is minor variation in junction temperature with HSTL.

There is 13.94, 16.43, 25.34 % power reduction from HSTL_II_DCI_18 to HSTL_I_12 on 100, 10 and 1 GHz as shown in Table27and Fig.31.

HSTL_II_DCI_18 HSTL_I_12 Fig. 30 Effect of HSTL on

power dissipation of image ALU

Table 27 Junction temperature

with HSTL GHz HSTL_II_DCI_18 (C) HSTL_I_12 (C)

1 12.2 10.5

10 14.6 12.2

100 36.3 27.1

(21)

5 SSTL IO Standard for Power Reduction

On 4th and final stage, using SSTL I/O Standard, there is 81.79 % power reduction from the initial power dissipation. SSTL is stub Series Transistor Logic. We are taking 4 different SSTL under our consideration in this work.

5.1 Migrate to SSTL135_R for Energy Efficiency

The output driver supply voltage of SSTL135 is 1.35 V. R stands for reduced drive strength. Power Dissipation with SSTL135_R is shown in Table28.

5.2 Migrate to SSTL12_DCI For Energy Efficiency

The output driver supply voltage of SSTL12_DCI is 1.2 V. The DCI is digitally controlled impedance. I/O Power Dissipation with SSTL12_DCI is shown in Table29.

5.3 Migrate to SSTL135_DCI for Energy Efficiency

The output driver supply voltage of SSTL135_DCI is 1.35 V. The DCI is digitally con- trolled impedance. I/O Power dissipation with SSTL135_DCI is shown in Table30.

HSTL_II_DCI_18 HSTL_I_12 Fig. 31 Effect of HSTL on

junction temperature of image ALU

Table 28 Power dissipation with SSTL135_R

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.156 0.068 10.8

10 0.758 0.069 11.7

100 6.774 0.083 28.5

Table 29 Power dissipation with SSTL12_DCI

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.500 0.068 11.3

10 0.912 0.069 12.0

100 5.029 0.075 11.3

(22)

5.4 Migrate to SSTL12 for Energy Efficiency

The VCCO of SSTL12 is 1.2 V and reference voltage is 0.6 V. I/O Power dissipation with SSTL_12 is shown in Table31.

5.5 Effect of SSTL on Power Dissipation Image ALU Design

There is 19.52 and 29.98 % power reduction from SSTL135_R to SSTL12 on 100 and 10 GHz respectively as shown in Table32and Fig.32.

Table 30 Power dissipation with SSTL135_DCI

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.452 0.068 10.8

10 0.864 0.069 11.9

100 4.993 0.075 20.8

Table 31 Power dissipation with SSTL12

GHz IOs power (W) Leakage power (W) Junction temperature (C)

1 0.198 0.068 10.5

10 0.610 0.069 11.6

100 4.726 0.075 20.5

Table 32 Power comparison

with SSTL GHz SSTL135_R (W) SSTL12 (W)

1 0.156 0.198

10 0.758 0.610

100 6.774 4.726

SSTL135_R SSTL12 Fig. 32 Effect of SSTL on

power dissipation of image ALU

(23)

There is significant reduction in IOs power with SSTL but there is minor variation in junction temperature with SSTL.

There is 28.07 and 8.62 % power reduction from SSTL135_R to SSTL12 on 100 and 10 GHz respectively as shown in Table33and Fig.33.

6 Conclusion

Our simulation results get 38.63 % reduction in I/O Power and 46.42 % reduction in leakage power when we scale down capacitance from 50 to 5 pF in Kintex-7 FPGA on 100 GHz. There is 6.59, 26.74, 38.63 % power reduction when we scale down capacitance from 50 to 5 pF on 1, 10 and 100 GHz respectively. With change in ambient temperature, there is no change in IO power but there is significant saving in leakage power con- sumption of Image ALU. We also achieved 67.05 % reduction in I/O Power when we scale down capacitance from 50 to 10 C in Kintex-7 FPGA on 100 GHz. Using high profile Heat Sink and 500 LFM Airflow, 75.39 % leakage power reduction is achieved from the last optimized result of capacitance scaling and 85.84 % leakage power reduction achieved from the initial power dissipation. On 3rd Stage, using HSTL I/O Standard, we achieved 64.53 % power reduction from the initial power dissipation. We also achieved, 41.06, 59.26, 78.75 % power reduction from HSTL_II_DCI_18 to HSTL_I_12 on 100, 10 and 1 GHz. On 4th and final stage, using SSTL I/O Standard, there is 81.79 % power reduction from the initial power dissipation. There is 19.52 and 29.98 % power reduction from SSTL135_R to SSTL12 on 100 and 10 GHz respectively. This work is implemented on 28 nm FPGA. International Technology Roadmap for Semiconductors and PTM define the specification of 28 nm in 2005 but Xilinx introduced the 28 nm 7 series Virtex-7, Kintex-7, and Artix-7 families in June 2010. Till 2013, the latest FPGA available in market is 28 nm.

Whereas, PTM LP for energy efficient design is released for 16 nm in 2008 and PTM LP

Table 33 Junction temperature

with SSTL GHz SSTL135_R (W) SSTL12 (W)

1 10.8 10.5

10 11.7 11.6

100 28.5 20.5

Juncon Temperature in C

SSTL135_R SSTL12 Fig. 33 Effect of SSTL on

junction temperature of image ALU

(24)

for energy efficient design is released for 7 nm in 2012. However, there is no commercial FPGA either 16 or 7 nm available in market. There is an open scope to redesign this work on 20, 16 nm or even less nm future ultra scale FPGA. We apply CTHS approach for energy efficient design; there is wide scope to implement other energy efficient techniques like mapping, clock gating, power gating, LVCMOS I/O standard, LVDCI I/O standard, Mobile Buffer I/O standard for further reduction of I/O power. Here base circuit is image ALU, therefore, one can make energy efficient other component of digital image processor.

In this work, Verilog is used.

References

1. Shrivastava, A., Kannan, D., Bhardwaj, S., & Vrudhula, S. (2010). Reducing functional unit power consumption and its variation using leakage sensors.IEEE Transactions on Very Large Scale Inte- gration (VLSI) Systems, 18(6), 988–997.

2. Yoonjin, K., & Mahapatra, R. N. (2010). Dynamic context compression for low-power coarse-grained reconfigurable architecture.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 18(1), 15–28.

3. Chatterjee, B., & Sachdev, M. (2005). Design of a 1.7-GHz low-power delay-fault-testable 32-b ALU in 180-nm CMOS technology. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 13(11), 1296–1304.

4. Wijeratne, S. B., et al. (2007). A 9-GHz 65-nm IntelPentium 4 Processor Integer Execution Unit.

IEEE Journal of Solid-State Circuits, 42(1), 26–37.

5. Nehru, K., Shanmugam, A., & Thenmozhi, G.D. (2012). Design of low power ALU using 8T FA and PTL based MUX circuits. Ininternational conference on advances in engineering, science and man- agement (ICAESM)(pp. 145–149).

6. Ho, W., Chong, K., Gwee, B., & Chang, J.S. (2013). Low power sub-threshold asynchronous QDI Static Logic Transistor-level Implementation (SLTI) 32-bit ALU. InIEEE international symposium on cir- cuits and systems (ISCAS), (pp. 353–356).

7. Rani, T.E., Rani, M.A., & Rao, R. (2011). AREA optimized low power arithmetic and logic unit. In3rd international conference on electronics computer technology (ICECT)(pp. 224–228).

8. Kulkarni, M., Sheth, K., & Agrawal, V.D., (2011). Architectural power management for high leakage technologies. InIEEE 43rd southeastern symposium on system theory (SSST)(pp. 67–72).

9. Zhang, B., Mei, K., & Zheng, N. (2013). Reconfigurable processor for binary image processing.IEEE Transactions on Circuits and Systems for Video Technology, 23(5), 823–831.

10. Shan, D., Ibrahim, M., Shehata, M., & Badawy, W. (2013). Automatic license plate recognition (ALPR): A state-of-the-art review.IEEE Transactions on Circuits and Systems for Video Technology, 23(5), 311–325.

11. Sullivan, G. J., Ohm, J., Han, W. J., & Wiegand, T. (2012). Overview of the high efficiency video coding (HEVC) standard.IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1649.

12. Chen, S. L. (2013). VLSI implementation of an adaptive edge-enhanced image scalar for real-time multimedia applications.IEEE Transactions on Circuits and Systems for Video Technology, 23(9), 1510–1522.

13. Bossen, F., Bross, B., Suhring, K., & Flynn, D. (2012). HEVC complexity and implementation analysis.

IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1685–1696.

14. Pushe, Z., Hongbo, Z., He, L., & Shibata, T. (2013). A directional-edge-based real-time object tracking system employing multiple candidate-location generation.IEEE Transactions on Circuits and Systems for Video Technology, 23(3), 503–517.

15. Wang, L., et al. (2013). Edge-directed single-image super-resolution via adaptive gradient magnitude self-interpolation.IEEE Transactions on Circuits and Systems for Video Technology, 23(8), 1289–1299.

16. Lainema, J., Bossen, F., Woo-Jin, H., Junghye, M., & Ugur, K. (2012). Intra coding of the HEVC standard.IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1792–1801.

17. Tian, J. (2013). Reversible data embedding using a difference expansion.IEEE Transactions on Circuits and Systems for Video Technology, 13(8), 890–896.

(25)

18. Choi, M., Chang, I. J., & Kim, J. (2013). High performance and hardware efficient multiview video coding frame scheduling algorithms and architectures.IEEE Transactions on Circuits and Systems for Video Technology, 23(8), 1312–1321.

19. Sjoberg, R., et al. (2012). Overview of HEVC high-level syntax and reference picture management.

IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1858–1870.

20. Peng, W. H., & Chen, C. C. (2013). An interframe prediction technique combining template matching prediction and block-motion compensation for high-efficiency video coding.IEEE Transactions on Circuits and Systems for Video Technology, 23(8), 1432–1446.

21. Jung, S. W. (2013). Enhancement of image and depth map using adaptive joint trilateral filter.IEEE Transactions on Circuits and Systems for Video Technology, 23(2), 258–269.

22. Zhou, M., et al. (2012). HEVC lossless coding and improvements.IEEE Transactions on Circuits and Systems for Video Technology, 22(12), 1839–1843.

23. Chuan, Q. (2013). An inpainting-assisted reversible steganographic scheme using a histogram shifting mechanism.IEEE Transactions on Circuits and Systems for Video Technology, 23(7), 1109–1118.

24. Series FPGA SelectIO Resources User Guide UG361 (v1.4) June 21, 2013http://japan.xilinx.com/

support/documentation/user_guides/ug471_7Series_SelectIO.pdf

25. Gupta, V., Mohapatra, D., Raghunathan, A., & Roy, K. (2013). Low-power digital signal processing using approximate adders.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(1), 124–137.

26. Suming, L., Yan, B., & Li, P. (2013). Localized stability checking and design of IC power delivery with distributed voltage regulators.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 32(9), 1321–1324.

27. Singh, D., Pattanaik, M., & Pandey, B., (2013). IO standard based low power design of RAM and implementation on FPGA.Journal of Automation and Control Engineering, 1(4), 316–320.

28. Esmaeili, S. E., & Al Kahlili, A. J. (2013). Integrated power and clock distribution network.IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 21(10), 1941–1945.

29. Pandey, B., Yadav, J., Singh, Y., Kumar, R., & Patel, S. (2013). Energy Efficient Design and Imple- mentation of ALU on 40-nm FPGA.IEEE international conference on energy efficient technologies for sustainability-(ICEETs).

30. Singh, V.P., Chaurasia, V. S., Pandey, B., & Yadav, J. (2013). Power reduction of ITC’99-b01 benchmark circuit using clock gating techniques. InIEEE international conference on computational intelligence and communication networks (CICN), Mathura.

31. Kumar, T., et al. (2014). Mobile DDR IO standard based high performance energy efficient portable alu design on FPGA. Springer Wireless Personal Communications, An International Journal, 76(3), 569–578.

Tanesh Kumar is currently serving as a lecturer at Faculty of Engineering Sciences and Technology, Indus University, Karachi, Pakistan. He received his B.E in Computer Engineering from National University of Sciences and Technology (E&ME) Rawalpindi and completed M.Sc. in Computer Science from South Asian University, New Delhi. His area of interest includes Low Power techniques in VLSI, Green Computing, Energy Efficient techniques on FPGA and Internet of Things. He has authored and coauthored over 45 papers in journals and conference proceedings in various areas of Green Com- puting, FPGA, VLSI and its applications.

(26)

Bishwajeet Pandeyis working in Centre of Excellence of Chitkara University-Punjab Campus. He has worked as Junior Research Fellow (JRF) at South Asian University (University declared under SAARC Charter) and visiting lecturer in IGNOU on weekends. He has com- pleted M.Tech. from IIIT Gwalior and done R&D Project in CDAC- Noida. He is working with hundreds of Co-Researcher from Industry and Academia to create a globally educational excellence in Gyancity Research Lab and Chitkara University Research and Innovation Net- work (CURIN). He has authored and coauthored over 150 papers in SCI/SCOPUS/Peer Reviewed Journals and IEEE/Springer Conference proceedings in areas of Low Power Research in VLSI Design, Green Computing, and Electronic Design Automation. He has filled 2 patents in Patent Office in Intellectual Property Building Delhi and also authored 3 books available for sale on Amazon and Flipkart. He is a technical programme committee (TPC) member in various conferences across globe. Every year, He organizes two Scopus index conferences across the globe and two special sessions in IEEE/Springer conference.

S. H. A. Mussaviis Ph.D. and M.E. in Telecommunication Engi- neering under HEC Scholarship and B.E. in Electronics Engineering from Mehran University of Engineering and Technology. He is cur- rently serving as Dean Faculty of Engineering Science and Technology Indus University Karachi. Previously he was engaged as Chairman Department of Electrical and Electronics Engineering Hamdard University Karachi. To his credit are more than 25 research publica- tions in national and international journals. He has attended numerous international conferences as invited speaker. He is on review board of two impact factor international journals. He is member of numerous national and international societies including member IEEEP Karachi local council, IEEE, IEEE Computer society, IEEE Signal Processing Society, IEEE Devices and Circuits Society, IEEE Communications Society etc.

Noor Zamanacquired his Degree in Engineering in 1998, and Mas- ter’s in Computer Science at the University of Agriculture in Faisal- abad Pakistan in 2000. His academic achievements further extended with a Ph.D. in Information Technology at University Technology PETRONAS (UTP), Malaysia. He is currently working as a Faculty member in the College of Computer Science and Information Tech- nology, King Faisal University, in Saudi Arabia. He has authored more than 50 research papers, and edited three books, has many publications to his credit. He is an Associate Editor, Regional Editor and Reviewer for a number of reputed international research journals around the world. He has completed several international research grants and currently involved with different funded projects in different countries.

His area of interest include Wireless Sensor Network (WSN), Mobile Computing, Cloud Computing, Network and Communication, Soft- ware Engineering, Green Technology, Artificial Intelligence, Operat- ing System, Unix, and Linux.

Rujukan

DOKUMEN BERKAITAN

This reinforced by the Analysis of Variance (ANOVA) in Table 6, there was a significant difference between density with maturity groups and portions, but there was no

Similarly, there was no significant difference in pH values between infested and not-infested of orange and purple sweet potatoes (Fig.. 6) and the mean total carotenoid (Fig.

However, velocity is unperturbed by change in thermal conductivity parameter; constant viscosity fluid has larger bolus whereas there is no effect on bolus size for variable

Thus, this probe begins from an assumption that newsworthiness, newspaper content sourcing and content design in Nigerian newspapers follow an interplay of influence of

There are no exception for Intel Stratix 10 FPGA that required proper power sequencing [3].There are a lot of methods used for power sequencing on FPGA such using discrete

There is no significant difference in the overall graphic design achievement as measured by mean scores in graphic design assessment rubrics (GDAR) and graphics design

Eventually, based on the results of previous works, change in occupant’s behaviour is the most efficient and cost saving method to reduce energy used in the buildings (Al-Mumin,

For the average power dissipation and time delay, the GDI technique consumed less power and shorten the time by executing the output waveform based on the operation,

Building more energy-efficient and sustainable urban areas to mitigate the effects of climate change is as important as anticipating living conditions in future

Figure 12 shows the XRD results for doped and undoped CCTO sample. XRD results for sintered samples shows that there is no significant change in peak position with

Based on the result of One-Way ANOVA analysis as shown in Table 9, the test statistic F 2,28 = 0.440 (p-value > 0.05), we can conclude that there is no significant difference in

This research was done to improve producer gas fuelled SI engine in term of brake power, brake thermal efficiency, specific energy consumption, exhaust gas temperature,

As shown in Figure 2.1, marine renewable energy can be categorised as tidal barrage, tidal current energy, wave energy, Ocean Thermal Energy Conversion (OTEC) power,

There are many algorithms proposed to solve image segmentation problem, but the main challenge in medical image processing is to perform segmentation in the presence of

Moreover, there is presently no robust computerized method for bone age assessment in the health environment, partly due to the limitation in image analysis and image

4- My bank has convenient branch locations 5- My bank offers a complete range of services 6- My bank provides easily understood statements 7- It is very easy to get into and

Based on VRS approach, the average efficiency score for large bank is steadily increasing and the graph shows the gap between the average efficiency scores

Based on the transmitting image result as shown in table 5, this PSK Interface prototype and SSTV software successfully achieved the objective of this project. The

There are many ways to convert the waste thermal energy into electrical energy that has led to a significant reduction in fuel consumption of internal combustion engines,

Variation of a real power and reactive power on the weakest bus and repetition of load flow analysis with the change of loads and transformer tap setting is conducted in order to

Although their design is simple, there is always flow of either condensate or live steam through the orifice because there is no way to automatically change the size

It is argued that the impact of the Malaysian exchange diplomacy does not only result in the reinforcement of Malaysia’s image as a middle power, but also in assisting

Ho 3 : There is no significant difference in the historical literacy of senior secondary school students in Ilorin South Local Government Area based on