HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

(1)

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

LIM CHUN MING

UNIVERSITI SAINS MALAYSIA

2017

(2)

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

by

LIM CHUN MING

Thesis submitted in partial fulfillment of the requirements for the degree of

Bachelor of Engineering (Electronic Engineering)

JUNE 2017

(3)

ii

ACKNOWLEDGEMENTS

This dissertation is dedicated to everyone in the field of artificial neural network in embedded system who embarks the journey of expanding the collection of knowledge and transcendent passion for artificial neural network in embedded system.

Firstly, I want to thank my final year project and research supervisor Assc. Prof.

Dr. Zaini Abdul Halim for her patient guidance and advices throughout the time of doing my final year project. Her continuous support and precious advices are one of the factors in successful completion of my project and are highly appreciated. I feel proud to be one of the student of my supervisor.

Besides, I would also like to thank the PhD student, Tan Earn Tzeh for his precious advices and support for my final year project. He is always willing to share his experiences and knowledges to me and giving me advices about the final year project and research.

In addition, I want to thank my precious course mates that study together for 4 years in my university life. They are always willing to spend their time with me, sitting together exchanging knowledges, discussing about the problems faced during the time of doing final year project.

Lastly, I would like to thank my family for giving me the love and support.

Without their support, I would not have successfully completed my final year project and thesis.

(4)

iii

LIST OF TABLE

Table 2.1: Summary of Related Works……….………...16

Table 3.1: Example of decimal number in 16 bits fixed point format...26

Table 4.1: Tabulated results for MATLAB and ARM processor………...42

Table 4.2: Tabulated results from MATLAB and FPGA………...44

Table 4.3: Comparison of accuracy in MSE...46

Table 4.4: Comparison of execution time...51

Table 4.5: The hardware resources usage by each model...53

Table A.1: Tabulated result for model 2 in ARM processor...60

Table A.5: Tabulated result for model 2 in FPGA...64

(7)

vi

LIST OF FIGURES

Figure 2.1: Multilayer Perceptron Model...6

Figure 2.2: FPGA fabric structure [15]...7

Figure 3.1: Project Implementation Flow...19

Figure 3.2: Software Design Flow...20

Figure 3.3: FPGA Hardware Design Flow...21

Figure 3.4: Multilayer perceptron models...22

Figure 3.5: Artificial neuron model...23

Figure 3.6: Algorithm flow chart for processing in neuron...23

Figure 3.7: Overall program flow chart of ANN...24

Figure 3.8: Block diagram for FPGA implementation of neuron...25

Figure 3.9: Overall block diagram for FPGA implementation of ANN...25

Figure 3.10: Data format for hardware implementation...26

Figure 3.11: Pure linear function [29]...27

Figure 3.12: Hyperbolic tangent sigmoid function [30]...28

Figure 3.13: Piecewise linear method...28

Figure 3.14: Piecewise linear functions of tangent sigmoid...,...29

Figure 3.15: Neural Network Toolbox...30

Figure 3.16: Data set provided by neural network toolbox...31

Figure 3.17: MLP model 1 structure...32

(8)

vii

Figure 3.22: Linux terminal...35

Figure 3.23: Flow chart of test program...36

Figure 3.24: ModelSim Altera simulation...37

Figure 3.25: Maximum frequency summary report...38

Figure 3.26: Compilation report...39

Figure 4.1: Results shown in Linux terminal...42

Figure 4.2: Graph of comparison of accuracy in MSE...46

Figure 4.3: Results in Linux terminal...47

Figure 4.4: Simulation results from ModelSim Altera...48

Figure 4.5: Report of maximum frequency...49

Figure 4.6: Graph of comparison of execution time...51

Figure 4.7: Compilation report...52

Figure 4.8: Graph of hardware resources usage...53

Figure A.1: Simulation Result for model 1 in FPGA...63

Figure A.2: Total execution time shown in Linux Terminal for model 2...66

(9)

viii

Figure A.6: Total clocks used for model 2...68

Figure A.7: Maximum Frequency for model 2...68

Figure A.14: Hardware Resources usage by model 2...71

Figure A.18: Result in MATLAB by model 1...73

Figure A.23: RTL block diagram for MLP model 1...76

(10)

ix

LIST OF ABBREVIATIONS

ALM Adaptive Logic Module ALU Arithmetic Logic Unit

ANN Artificial Neural Network

ASIC Application Specific Integrated Circuit CPU Central Processing Unit (processor)

DSP Digital Signal Processing

FMAX Maximum Frequency

FPGA Field Programmable Gate Array

GUI Graphic User Interfaces

HDL Hardware Description Language

HPS Hard Processor System

LED Light Emitting Diode

LUT Look Up Table

MAC Multiply-Accumulate Circuit

MLP Multilayer Perceptron

MSE Mean Squared Error PC Personal Computer

ROM Read Only Memory

(11)

x

RAM Read and Write Memory

SOC System On Chip

VHDL Very High-Speed Integrated Circuit HDL VT Ventricular Tachycardia

VF Ventricular Fibrillation

(12)

xi

PERLAKSANAAN PERKAKASAN DAN PERISIAN RANGKAIAN NEURAL TIRUAN DALAM ALTERA DE1-SOC

ABSTRAK

Rangkaian neural tiruan banyak digunakan dalam banyak aplikasi dan mula digunakan dalam sistem terbenam. Baru-baru ini, platform yang baru seperti Altera DE1-SOC yang mempunyai kedua-dua pemproses and FPGA telah diperkenalkan di pasaran. Apabila menggunakan platform ini, rangkaian neural tiruan boleh dilaksanakan samada di dalam pemproses menggunakan perisian atau di dalam FPGA menggunakan perlaksanaan perkakasan. Analisis perlu dijalankan untuk mengetahui samada pemproses atau FPGA lebih sesuai untuk melaksanakan rangkaian neural tiruan. Dalam projek ini, rangka kerja tentang pelaksanaan ANN pada pemproses dan FPGA dalam Altera DE1-SOC telah dibuat dan prestasi pelaksanaan tersebut tentang ketepatan, masa pelaksanaan dan penggunaan sumber telah dikaji. Beberapa model multilayer perceptron(MLP) dengan bilangan input, bilangan neuron tersembunyi dan jenis fungsi pengaktifan yang berbeza telah dilatihkan di MATLAB dulu, barulah dilaksanakan pada kedua-dua pemproses dan FPGA. Eksperimen juga telah dijalankan untuk mengukur prestasi model-model ini dalam pemproses and FPGA. Setelah membandingkan keluaran dengan ANN pada MATLAB dan mengirakan purata kuasa dua ralat (MSE), keputusan menunjukkan ANN pada pemproses mempunyai 100% ketepatan dan ANN pada FPGA mempunyai minimum 7.3 x 10^-6MSE. Manakala, ANN pada FPGA 20 kali lebih laju berbanding dengan ANN pada pemproses. Oleh itu, sekiranya satu sistem memerlukan ketepatan yang tinggi, ANN dicadang untuk dilaksanakan pada pemproses. Manakala, sekiranya masa pelaksanaan sangat penting, ANN dicadang untuk dilaksanakan pada FPGA.

(13)

xii

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

ABSTRACT

Artificial neural network (ANN) has been widely used in many applications and has been started to be implemented in embedded system. Recently new platform like Altera DE1- SOC that contains both processor and FPGA had been introduced. When using this type of platform, artificial neural network can be either implemented in processor using software implementation or in FPGA using hardware implementation. Analysis should be done to see whether processor or FPGA is a better choice for the ANN. In this project, framework for implementation of ANN in processor and FPGA of Altera DE1-SOC has been developed and the efficiency of implementation of ANN in processor and FPGA in terms of accuracy, execution time and resources utilization has been studied. Several multilayer perceptron (MLP) models with different number of inputs, number of hidden neurons and types of activation function have first been trained in MATLAB and after that, these trained models have been implemented in both processor and FPGA of Altera DE1-SOC. Experiments have been carried out to test and measure the performance of these MLP models in processor and FPGA. After comparing output result with ANN that run in MATLAB and computing the mean squared error (MSE), results show that the ANN in processor has 100% accuracy and ANN in FPGA has minimum MSE of 7.3 x 10^-6. While ANN in FPGA is 20 times faster than ANN in processor. Therefore, if accuracy is main priority and execution time is not so important in a system, ANN is suggested to be implemented in processor. However, if execution time of ANN must be fast like less than microsecond in a system, ANN is suggested to be implemented in FPGA.

(14)

1

CHAPTER 1 INTRODUCTION

1.1 Research Background

Artificial neural network (ANN) is a computational model that is based on the interconnection of the neuron in the nervous system of human brain. ANN can be trained using learning algorithm and data set to solve problems. ANN can be used for pattern recognition, classification, prediction etc.

Today, artificial neural network has been widely used in various field of application like photovoltaic system [1], [2], medical diagnosis system [3], stock market prediction [4] etc. Most of the ANN implementations are PC based simulation software model [5], [6] which is relatively slow compared to hardware implementation type of ANN [7]. Besides, many of the ANN implementation in medical diagnosis system is not portable [8], [9]. Thus, more research should be done to study the hardware implementation of ANN model to increase the portable neural network solution in the market.

When it comes to portable embedded platform, CPU, ASIC, FPGA are available and recently, hybrid CPU-FPGA platform which integrate processor together with FPGA has been introduced in the market. In October 11, 2011 Altera Corporation introduces a new SOC FPGAs device that integrating a dual core ARM Cortex-A9 MPCore Processor with 28-nm Cyclone V and Arria V FPGA fabric [10]. High-level management functionality of processor and real-time, high flexibility, extreme data processing FPGA becomes one of the powerful embedded platform.

(15)

2

Since hybrid CPU-FPGA platform has both processor and FPGA, sometimes one may have to consider which one to use for the application. FPGA with parallel feature has higher data processing rate, but design with high performance that use up a lot of resources may have a higher cost. [11]

It is important to have a detail analysis on the performance of application like artificial neural network on processor and FPGA based on several aspects like accuracy, execution time, resource utilization or cost to justify whether implementation on processor or FPGA is better for the artificial neural network.

In this project, implementation of multilayer perceptron neural network (MLP) in Altera DE1-SOC platform will be presented. Five MLP models with different structure, for example different number of input feature, different number of hidden neuron, and different type of activation function will first be trained in MATLAB and after that, trained models will be implemented on FPGA and Arm Cortex A9 of DE1-SOC separately. Performance of each model on FPGA and processor will be evaluated and analyzed based on accuracy, execution time and resource utilization. Through this project, people may gain a better and thorough understanding on the effectiveness of implementation of ANN models in processor and FPGA. This study will give a clear comparison of performance on the implementation of ANN models on processor and FPGA in hybrid CPU-FPGA platform like Altera DE1-SOC.

1.2 Problem Statement

Recently, new hybrid CPU-FPGA platform like Altera De1-SOC that integrates Arm Cortex A9 with FPGA fabric has been introduced in the market. When implementing application or system for example artificial neural network in this platform that has both processor and FPGA, some may wonder whether artificial neural network should be

(16)

3

implemented in processor or FPGA. Implementation of ANN model in FPGA may have high data processing rate but the cost might be high if the design takes up huge area of hardware resources. ANN model is easy to be developed in processor and lower cost but the execution time of ANN model might be longer. A detail analysis should be done to see whether CPU or FPGA is a better choice of implementation for artificial neural network in hybrid platform Altera DE1-SOC.

1.3 Objective of Research

1) To develop a framework for implementation of artificial neural network in Altera DE1- SOC board.

2) To study the efficiency of implementation of artificial neural network in processor and FPGA in terms of accuracy, execution time and resources utilization.

1.4 Scope of project

In this project, multilayer perceptron neural network will be implemented in processor and FPGA of Altera DE1-SOC board. Five multilayer perceptron models with different number of inputs, different number of hidden neurons and different type of activation function will first be trained in MATLAB and trained neural network will be implemented in processor using C programming language and will be implemented in FPGA using Verilog hardware description language. After that, experiments will be carried out to test and measure the performance in terms of accuracy, execution time and resources utilization for several MLP models implemented in processor and in FPGA. After that, the result of the experiments will be analyzed.

(17)

4 1.5 Thesis Outline

This thesis consists of 5 chapters. Chapter 1 is introduction that discusses about the research background, main motivation of doing this research, problem statement, objective of the research and scope of the project.

Chapter 2 is literature review that reviews about the previous related work done on implementation of artificial neural network in FPGA and in microprocessor. Some basic knowledge related to the project is also being discussed.

Chapter 3 is methodology. It discusses about the project implementation flow, design flow for software and FPGA, structure of multilayer perceptron, model of neuron and the way of implementing it, type of activation function and the way of implementing it. Besides, experimental procedure regarding testing and measuring the performance of several MLP models in terms of execution time, resources utilization and accuracy is also being discussed.

Chapter 4 presents the result and analysis of the experiments designed in previous chapter. The performance of several MLP models in processor and FPGA in terms of accuracy, execution time and resources utilization will be presented. Analysis regarding the results will also be discussed.

Chapter 5 is conclusion that presents the summary of the work and results of the project. Future works of this project is also included.

(18)

5

CHAPTER 2 LITERATURE REVIEW

2.1 Overview

Recently, researchers have started to do research on hardware or software implementation of artificial neural network in portable embedded platform. Many methods of implementation and type of ANN have been studied for the past few years. In this chapter, some fundamental knowledge related to project has been discussed. Besides, previous work done on implementation of artificial neural network on embedded platform like FPGA or microprocessor also has been reviewed in this chapter also.

2.2 Artificial Neural Network

Artificial neural network is a computation model that consists of many interconnected nodes or neuron inspired by the way of how biological neural network in brain process information. [12] Each neuron in the artificial neural network is a processing element.

Artificial neural network can be trained to solve problems and used for pattern recognition, classification, prediction etc. The most common form of artificial neural network is multilayer perceptron as shown in Figure 2.1. Multilayer perceptron is a feedforward neural network where data flows only in one direction from input layer through hidden layer and to output layer. Each neuron in layer is connected to all neurons in next layer with certain weights.

(19)

6

Figure 2.1: Multilayer Perceptron Model

The most common used training algorithm for artificial network is backpropagation algorithm where output is compared with target data and error function is computed. This error is then feedback through the network and weight is adjusted based on the error [13].

The algorithm adjusts weight to reduce the error function. The process continues until the error becomes small.

2.3 Processor (CPU)

Processor or Central Processing Unit (CPU) is an electronic circuitry that executes the instructions of a program in sequential manner by performing arithmetic, logic and control operations. CPU basically contains arithmetic logic unit (ALU) that performs arithmetic and logic operations, register that store the result of ALU, and control unit that control the fetching and execution of instruction.

(20)

7 2.4 Field Programmable Gate Array (FPGA)

Field programmable gate array is a semiconductor device that is designed to be configured by designer or customer to a desired application with certain functionality requirements.

[14] FPGAs contain an array of programmable logic blocks and reconfigurable interconnects that allows the logic blocks to be wired together to perform complex combinational function. Hardware description language (HDL) is used to specify the configuration of the logic block. Figure 2.2 shows the structure of FPGA fabric.

Figure 2.2: FPGA fabric structure [15]

One of the important features of FPGAs is the ability of parallel processing which is different to processor that runs program in sequential manner. Flexibility is another feature of FPGA since the functionality of the FPGA can be changed every time the device is powered up. Change can be made by downloading new configuration file into the device.

(21)

8 2.5 CPU-FPGA Hybrid Platform

Recently, a new hybrid platform that integrating processor with FPGA fabric had been released. Altera DE1-SOC is one of the CPU-FPGA hybrid platform that integrates ARM-based hard processor system with FPGA fabric using high-bandwidth interconnect.

[16] Hard processor system consists of dual-core ARM Cortex-A9 embedded cores, peripherals and memory interfaces. Besides, Altera DE1-SOC board also contains Ethernet networking, video capabilities and DDR3 memory. With this hybrid platform, user can implement design in processor only by using software, or in FPGA by using hardware description language, or in both processor and FPGA by using hardware software co-design method.

2.6 Hard Processor vs Soft Processor

Hard processor is a type of processor system where all components have been fixed in the board and cannot be reconfigured after manufactured. [17] While soft processor refers to the processor system that created using resources in field programmable gate array (FPGA) and can be customized and reconfigured based on desire specification. Soft processor has better flexibility than hard processor because of the ability to reconfigure.

[18] However, the hard processor has lower power consumption and better performance compared to soft processor.

(22)

9

2.7 Floating Point vs Fixed Point Number Representation

Floating point is a type of number representation that used for fractional number. [19]

Floating point is based on scientific notation where it consists of mantissa and exponent.

The decimal point in floating point format can ‘float’ and not fixed. While fixed point is alternative way to represent fractional number where the decimal point is fixed. The number of digits or bits before and after the decimal point is fixed. Floating point has better dynamic range and precision than fixed point. [20] However, fixed point implementation is less costly than floating point implementation in hardware.

2.8 Related Works

K.V.Ramanaiah, et al.[7] had proposed a parallel multilayer perceptron neural network architecture that implemented on Xilinx Virtex family FPGA device. The proposed hardware architecture was being implemented in two phases. First phase includes the training of neural network in MATLAB program. In the learning stage, the weight and bias were updated based on the backpropagation training algorithm. Second phase was hardware implementation of the trained neural network on the Xilinx FPGA device. The size of the neural network was 16-4-16 that means first layer with 16 input features, hidden layer with 4 neurons and output layer with 16 neurons. For the multiplication part of the architecture, Wallace tree based arithmetic was used. The activation function used in the hidden layer is a tangent sigmoid, while the activation function used in output layer is a pure linear function. For implementation of the tangent sigmoid function in FPGA, look up table (LUT) method was used and this LUT was stored in ROM. After experimental analysis, the architecture on FPGA was found to be operated at the

(23)

10

maximum speed of 138MHz, occupying 3% of the resources and consuming a total power of 55.285mW.

Another researchers Andres Orozco-Duque, et al. [21] had presented the implementation of artificial neural network on a 32bit ARM Cortex M4 microcontroller and on a Xilinx FPGA Spartan 6 for real time detection of ventricular tachycardia (VT) and ventricular fibrillation (VF). The implemented neural network structure had 4 layers:

1 input layer with 4 neurons ,2 hidden layer each with 3 neurons and 1 output layer with one neuron. The activation function used at the hidden layers was sigmoid function and the activation function used at the output layer was pure linear function. The ANN model was first trained in the MATLAB program using backpropagation algorithm. After that, the trained ANN model was implemented in FPGA using Xilinx System Generator. ROM block is used to implement the step by step approximation for sigmoid function. For microcontroller part, the neural network was implemented on C. Andres Orozco-Duque, et al used CMSIS 2.0 library for the implementation of ANN. CMSIS 2.0 library consists of more than 60 DSP function that includes fixed point and single precision floating point math that is possible to help to implement the prediction algorithm of neural network with matrix operation. For the activation function of the ANN, it is not necessary to use approximation function as in FPGA as it can use “math.h” library in C. After experimental analysis, test result shows an accuracy of 99.46% and execution time of 30us in FPGA and less than 0.6ms in microcontroller. Andres Orozco-Duque, et al. mentioned that low power consumption was the advantage of using microcontroller and it was easy to implement complex mathematical function using C programming language.

A.Thilagavathy and K.Vijaya Kanth [22] had presented the digital hardware implementation of artificial neural network for image recognition. The proposed neural network was a multilayer perceptron network with 1 input layer, 1 hidden layer and 1

(24)

11

output layer. The proposed neural network was implemented on Xilinx Spartan 3E device by using VHDL. A.Thilagavathy and K.Vijaya Kanth had used three types of activation function in this research which were hyperbolic tangent sigmoid, symmetric saturating linear and symmetrical hard limiter. The motivation of the research was to test out three different types of activation function and compare the performance of these three types activation function in neural network. After simulation using Model Sim 5.7, results show that total execution time for multilayer perceptron network with hard limit function was 15.820ns, 15.231ns for saturating linear function and 29.598ns for hyperbolic tangent sigmoid function. In terms of resources utilization, hard limit function used the least number of resources and hyperbolic tangent sigmoid used the highest number of resources of FPGA. This research is good since it tested on 3 different types of activation function and compared their performance. But it didn’t describe more detailed on the method used to implement these activation function. Different method of implementation may affect the result of the experiment.

Philippe Dondon, et al. [23] had also done an implementation of a feed-forward neural network on FPGA using VHDL language. The platform used was Zynq-7000 EPP 7Z020 Zedboard kit. The ANN model was first trained beforehand in MATLAB, after training was done, the weight and bias were extracted. Philippe Dondon, et al. used ROM to store the weight and bias and used RAM to store the input of network. Data format of 32 bit where 1 bit for the sign, 21 bit for the integer and 10 bit for the fractional part. State machine was used to control the modules. The size of the network was 3 layers where input layer consists of 3 neurons and hidden layer consists of 3 neurons and output layer consists of only 1 neuron. 2 types of solution of neural network structure had been tested.

One module per neuron was used in first structure and each neuron carried out operations in parallel. Once initialization start, each of them did the multiplication and summing

(25)

12

process, and finally sent the result to RAM. This solution allows a high-speed advantage but might end up using a lot of FPGA resources. Second solution of structure using only 1 module per layer. Additional module was added to each neuron module to remind the neuron module about its position in the network. This type of structure consumed less amount of FPGA resources compared to the first solution but end up increasing the process time. After simulation, the first solution consumed 16 DSP components of total 220 DSP components which enable the implementation of total 55 neurons. In this research, the author did not test out different type of activation function which may be a factor of affecting the overall resources consumptions of the design.

Sami El Moukhlis, et al. [24] had proposed another FPGA implementation of artificial neural network for classification of handwritten signature. The ANN model was first trained in MATLAB using backpropagation algorithm, after that the weight and bias were extracted. The proposed ANN model was implemented on Altera DE2-70 FPGA and being described using Very High Speed Integrated Circuits Hardware Description Language (VHDL). The proposed architecture of ANN has 3 layers with the size of 20 neurons in input layer, 6 neurons in hidden layer and 1 neuron in output layer. Both hidden layer and output layer used sigmoid activation function and 16 bits MAC or multiply accumulate circuit was used. For the implementation of the sigmoid function, piecewise liner approximation method which consists of linear approximation of log sigmoid function was used instead of look up table (LUT) method. After simulation, the proposed design used up 39% of the resources and only 5% of the embedded multiplier.

In this research, Sami El Moukhlis, et al. [24] used piecewise linear approximation instead of LUT may decrease the amount of resources needed but might end up increase the execution time of the ANN model [F] since LUT is better than piecewise linear approximation in terms of execution time.

(26)

13

Andre L.S Braga, et al. [25] had done an interesting research on analyzing the implementation of ANN models in both FPGA by hardware description language (HDL) and in Xilinx MicroBlaze embedded soft microprocessor by software approach in terms of resources utilization, accuracy, speed and power consumption. The author highlighted several factors that will affect the performance of the implementation of ANN like data presentation, the way of implementation for activation function etc. Author had created 3 ANN models for FPGA with VHDL and 4 models for MicroBlaze soft microprocessor.

First FPGA model named VHDL A was a 13 bits fixed point where 3 bit for integer part and 10 bit for fractional part with activation function being implemented in the form of look up table (LUT). VHDL B was also 13 bits fixed point with activation function implemented using combinatorial design. VHDL C was a 32 bits fixed point where 4 bits for integer part and 28 bits for fractional part with activation function being implemented using piece-wise linear approximation technique. Besides, same ANN model was also programmed in C language and implemented in MicroBlaze embedded soft microprocessor. After that experiment had been carried out. Results showed an increasing number of logic blocks used from VHDL A to VHDL C. VHDL A had used the most of the memory spaces among other designs. This is due to the presence of the look up table that consumed a lot of memory resources. MicroBlaze design had better accuracy than FPGA designs but FPGA designs were way faster than MicroBlaze designs.

The research done by Andre L.S Braga, et al. [25] was very good because he analyzed different ANN models in terms of power consumption, resources utilization, accuracy and speed. But he did not take into account of other important factors like number of input, number of hidden neuron and type of activation function into his ANN model for experimental analysis. These factors may also affect the performance of the FPGA design.

(27)

14

A. P. do A. Ferreira and E. N. da S. Barros [26] had proposed an architecture that could support the implementation of large ANN models by concerning the area reduction.

The proposed architecture had a size of 256 input neurons, 10 hidden neurons and 10 output neurons. Instead of using m+1 multipliers and compute each neuron individually, the proposed method used less adders for entire layer. The proposed method help reduces the resources consumption. Another important feature of this architecture is that the area versus performance parameter can be tuned or adjusted by making some changes in the circuit. All the designs were implemented by using Verilog and synthesized with Synopsys Synplify Premier and Altera Quartus 2. After simulation, the final designs consumed 30% of the ALUTs components. In this research, author had done a good job by analyzing a big architecture with the proposed method. But in the proposed method, author only focus on the structure of multiply accumulation unit but didn’t take into account of the type of activation function which may be an important factor that will affect the performance of whole architecture.

OZDEMIR, A.T. and DANISMAN, K. [27] had done a comparative study on two artificial neural network architectures. First model was a 32-bit floating point FPGA based model. Sigmoid function was used at the hidden and output layer. Second model was a 16-bit fixed point FPGA based model. Sigmoid function was used at the hidden layer while pure linear function was used at the output layer. Piecewise linear approximation method was used to implement sigmoid function. Both architecture had a sized of 8 input neurons, 2 hidden neurons and 1 output neuron. First model achieved accuracy of 97.66% and second model achieved accuracy of 96.54%. However, resource utilization of second model was lower than the first model. Piecewise linear approximation approach for sigmoid function consumed less hardware resources than the real sigmoid activation function block in the first model. Classification time for the first

(28)

15

model is 1.07us and 1.01us for second model. Second model was faster because of the 16-bit fixed point numerical representation since the routing in FPGA was highly reduced.

Boussaa Mohamed, et al. [28] had done a comparison of Nios2 software based and hardware based of implementation of multilayer perceptron neural network on FPGA Altera DE2-70 in terms of execution time and hardware resources utilization. NIOS 2 is a 32-bit soft core microprocessor that being built using logic of the FPGA resources. The hardware implementation in FPGA was being described using VHDL. The implementation of sum, multiplication, division and exponential used mega functions block described by VHDL. These mega functions block were intellectual property proposed by developer of QUARTUS. For the software implementation part, microprocessor NIOS 2 was being implemented first using QUARTUS and SOPC builder.

After that the multilayer perceptron model can just being programmed using NIOS 2 EDS that allows the use of C, C++ and assembler language. After all implementation in hardware and software part done, these implementations had been tested. Architecture size used was 4 input neurons, 3 hidden neurons, and 2 output neurons. All modes received same synaptic weights and input vectors. Results showed that execution time of software model was 1s and 1.9us for the hardware model. Although hardware model was way faster than software model but the resources consumed by software model was less than hardware model.

In overall, research done by Boussaa Mohamed, et al. [28] was good because a comparison on software model of ANN in soft processor and hardware model of ANN in FPGA in terms of execution time and hardware resources utilization had been done. But only one type of size of ANN architecture which is 4-3-2 was tested. It would be better if he did several more architecture that took into account of some factors like number of input, type of activation function and etc to make it a more detailed analysis.

(29)

16

Table 2.1: Summary of Related Works References Year

Published

Platform ANN architecture

Activation function of output layer

Measurement Features

[25]

Andre L.S Braga, et al.

2010 Xilinx FPGA 1 input, hidden and output layer

Hyperbolic tangent sigmoid

Accuracy Execution time Power consumption Resources Utilization

Comparison of ANN in FPGA and soft microprocessor in

terms of time, accuracy, power,

resources [22]

A.Thilagavathy and K.Vijaya

Kanth

2013 Xilinx

Spartan 3E

1 input, hidden and output layer

Hyperbolic tangent sigmoid Saturating

linear Hard limiter

Execution time Resources Utilization

Several activation functions were implemented and performance was

compared [27]

OZDEMIR, A.T. and DANISMAN,

K.

2013 FPGA 8-2-1 Sigmoid

Pure linear

Execution time Resources Utilization

2 type of FPGA based ANN was

compared [7]

K.V.Ramanaiah, et al.

2014 Xilinx Virtex FPGA

16-4-6 Pure linear Execution time Resources Utilization

ANN with large size was implemented [28]

Boussaa Mohamed, et al.

2015 FPGA Altera DE2

4-3-2 Not reported Execution time Resources Utilization

Comparison of ANN in FPGA and soft processor NIOS 2 in

terms of time and resources

(30)

17 2.9 Summary

In overall, there were quite some of the researchers did research on the implementation of artificial neural network on FPGA or embedded soft microprocessor. Most of the findings showed that many of the researchers only focus on the FPGA implementation of ANN and they did not analyze and compare the performance between implementation in processor and FPGA. Although researcher Andre L.S Braga [25] did have some analysis and compare the performance of implementation of ANN model on FPGA and implementation of ANN model using software approach on embedded soft microprocessor in terms of resources utilization, execution time and accuracy, however there is still some improvement that can be made. The improvement is a more detailed analysis that include some other important factors like number of input, number of hidden neurons and type of activation function in testing of performance. Besides, there is still lack of detailed analysis of ANN on hybrid platform like Altera DE1-SOC that consists of hard processor and FPGA. It would be better if several ANN models that includes previously mentioned factors can be tested on processor and FPGA of hybrid platform like Altera DE1-SOC.

(31)

18

CHAPTER 3 METHODOLOGY

3.1 Overview

In this project, artificial neural network (ANN) will be implemented on FPGA using Verilog and will be implemented on processor using C programming language. The platform that being used is the Altera DE1-SOC that contains both hard processor system (HPS) and FPGA fabric. The ANN that will be implemented is a multilayer perceptron neural network (MLP) with 1 input layer, 1 hidden layer and 1 output layer. 5 MLP models with different activation function, number of input features and number of hidden neurons will be first trained in MATLAB and after that the trained network will be developed and implemented on FPGA and HPS. After that experiments will be carried out and the performance of these MLP models on FPGA and HPS in terms of execution time, resources utilization and accuracy will be measured and analyzed. With these collected and analyzed data, the performance of implementation of artificial neural network on FPGA and processor can be easily compared. Detail process that include implementation of ANN and experimental procedure will be explained in this chapter.

Figure 3.1 shows the overall project implementation flow.

(32)

19

Figure 3.1: Project Implementation Flow

3.2 Implementation Platform

In this project, Altera DE1-SOC board will be used for ANN implementation. Altera DE1-SOC is a hybrid platform that integrates ARM based hard processor system that consists of processor, memory interfaces and peripherals with industry leading FPGA fabric using high bandwidth interconnects.

(33)

20 3.2.1 Hard Processor System

Processor of hard processor system (HPS) is 800Mhz dual-core ARM Cortex-A9 MPCore processor with 1GB DDR3 SDRAM using 32bit data bus. Processor executes program or software in a sequential manner. Figure 3.2 shows the software design flow. At first, design or architecture of ANN will be defined and then it will be described by coding using C/ C++ programming language. After coding is done, it will be compiled, debugged and will be ready to be uploaded to processor for execution. Linux is used as operating system for ARM processor.

Figure 3.2: Software Design Flow

(34)

21 3.2.2 Field Programmable Gate Array (FPGA)

FPGA is Altera Cyclone 5 SE 5CSEMA5F31C6N device with 64MB SDRAM using 16bit data bus. FPGA is able to process data in parallel. The FPGA hardware design flow is shown in Figure 3.3. At first, design or architecture of ANN will be defined and then it will be described using Verilog hardware description language in software ModelSim Altera. After coding is done, functional verification will be carried out to check whether the design is being correctly described. If the design is correct, logic will be synthesized for the design. Place and route is the process of interconnecting all the logic blocks.

Figure 3.3: FPGA Hardware Design Flow

(35)

22 3.3 Implementation of neuron

Artificial neural network used in this project is a multilayer perceptron with 3 layers:

input layer, hidden layer, output layer. Multilayer perceptron is a type of feedforward neural network where information flows through layers in one direction only. Unit in each layer is called neuron. Neuron in layer is connected with certain weights to all neurons in next layer. All neurons except the input layer neuron are considered as processing elements. Figure 3.4 shows the structure of a multilayer perceptron neural network.

Figure 3.4: Multilayer perceptron models

Figure 3.5 shows the structure of artificial neuron where inputs of the neuron will be multiplied by respective weights. After that all products will be summed together with bias and go through the activation function. Activation function will then define the output based on the inputs.

(36)

23

Figure 3.5: Artificial neuron model

3.3.1 Software Implementation part

For the implementation of neuron in processor, program like using a loop to keep multiplication of inputs with weights until all inputs have been multiplied can be designed.

After that the sum will be added by bias and applied in activation function. Figure 3.6 shows the program flow chart for processing in neuron.

Figure 3.6: Algorithm flow chart for processing in neuron

(37)

24

64 bits double precision floating point will be used as data type for inputs, weights and bias in the C program of ANN. Figure 3.7 shows the total program flow chart of the ANN.

Processing of neurons will be carried out one by one. After all neurons in hidden layer have been processed, processing of the neurons in output layer will be carried out one by one until getting the final output.

Figure 3.7: Overall program flow chart of ANN.

(38)

25 3.3.2 Hardware implementation part

For the implementation of neuron in FPGA, multiplier is used to multiply inputs and weights one by one at each clock. The results are then feed to accumulator to be summed up together. At the end, accumulated result will be added with bias. Added result is then sent to activation function to obtain the output. Figure 3.8 shows the block diagram for hardware implementation of the neuron in FPGA.

Figure 3.8: Block diagram for FPGA implementation of neuron

Figure 3.9 shows the overall block diagram for FPGA implementation of ANN. Each neuron in layer will operate in parallel and simultaneously during each clock. After neurons in hidden layer finish processing, enable signal will be sent to neurons in output layer for operation.

Figure 3.9: Overall block diagram for FPGA implementation of ANN

(39)

26 Data Representation Format

Data format used in hardware implementation is 16-bits fixed point where 10 bit for fractional part, 5 bit for integer part and 1 bit for sign part. For the fractional part, starting from left, the first bit or left most bit is 2^-1or 0.5 and second bit is 2^-2 or 0.25 and consecutive bit is 2^-n. While the integer part remains the same where the least significant bit start is at the right. For example, if the integer bits are 00010, (0 x 2⁰⁾+ (1 x 2¹) = 2. If fractional bit is 1100000000, (1 x 2^-1) + (1 x 2^-2) + (0 x 2^-3) = 0.75. If integer bits and fractional bits being put together like (00010 1100000000), it will be equal to 2.75.

Figure 3.10: Data format for hardware implementation

For the negative numbers, signed magnitude representation is used where if sign bit is 1, it is a negative number and if the sign bit is 0, it is a positive number. Magnitude bit remains the same or unchanged for positive and negative number. Sign magnitude format is used because it is a simple concept and ease of understanding and implementation.

Table 3.1 shows some example of decimal numbers in 16 bits fixed point format.

Table 3.1: Example of decimal number in 16 bits fixed point format.

Decimal Number 16 Bits Fixed Point

1 0 00001 000000000

-1 1 00001 0000000000

0.5 0 00000 1000000000

-0.5 1 00000 1000000000

-2.75 1 00010 1100000000

(40)

27 3.4 Implementation of activation function

There are many type of activation functions and the activation functions that will be implemented in this project are hyperbolic tangent sigmoid function and pure linear function. Activation function of hidden layer will be tangent sigmoid function and the activation function of output layer will be either tangent sigmoid function or pure linear function.

3.4.1 Pure Linear Function

This function as shown in Figure 3.11 can be easily implemented in software by just multiply input by 1. And in hardware this function block will just feed input into output without operation.

Equation: f (n) = n

Figure 3.11: Pure linear function [29]

3.4.2 Hyperbolic Tangent Sigmoid Function

For implementation of hyperbolic tangent sigmoid function as shown in Figure 3.12 in software, this mathematical equation can easily be described using C/C++ programming language.

However, for hardware implementation of this non-linear function will be difficult due to the exponential nature. Several methods can be used to realize this non-linear activation function. Method that will be going to used is Piecewise linear approximation method.

(41)

28

Equation: f(n) = ²

1+𝑒^−2𝑛

− 1

Figure 3.12: Hyperbolic tangent sigmoid function [30]

Piecewise linear approximation is a method to construct a function to fit the non-linear function using several segments of linear function for certain interval. Blue line in Figure 3.13 is the tangent sigmoid function and the red line in the Figure 3.13 are the segments of linear function that try to fit the tangent sigmoid function. After using piecewise linear approximation method in MATLAB, non-linear tangent sigmoid function will become several linear functions or piecewise linear function.

Figure 3.13: Piecewise linear method

(42)

29

Figure 3.14 shows the piecewise linear function equation derive from the graph in Figure 3.13. These segments of linear function will be able to provide high approximation result of tangent sigmoid function. Besides, these linear functions can be implemented in FPGA using multipliers and adders.

f(x) =

{

−1 , x < −3.0 0.01688x − 0.94441 , −3.0 ≤ x < −2.5 0.04517x − 0.87368 , −2.5 ≤ x < −2.0 0.11775x − 0.72851 , −2.0 ≤ x < −1.5 0.28710x − 0.47448 , −1.5 ≤ x < −1.0 0.59895x − 0.16264 , −1.0 ≤ x < −0.5 0.92423x , −0.5 ≤ x < 0.5 0.59895x + 0.16264 , 0.5 ≤ x < 1.0 0.28710x + 0.47448 , 1.0 ≤ x < 1.5 0.11775x + 0.72851 , 1.5 ≤ x < 2.0 0.04517x + 0.87368 , 2.0 ≤ x < 2.5

0.01688x + 0.94441 , 2.5 ≤ x < 3.0 1 , x ≥ 3.0 Figure 3.14: Piecewise linear functions of tangent sigmoid

(43)

30 3.5 Experimental Setup

Five MLP models with different number of inputs, number of hidden neurons and type of activation function will be built and tested in processor and FPGA. Several MLP models are tested because the number of inputs, number of hidden neurons and type of activation function may affect the ANN execution time, resources utilization and accuracy in processor and FPGA. Thus, in order to have a more detailed analysis, 5 models that consider these factors will be evaluated. Testing will be carried out to measure the performance of several MLP models in processor and FPGA in terms of execution time, resources utilization and accuracy.

Five MLP models will be trained in MATLAB using the data set like Iris Flower and Wine Vintage provided by MATLAB for classification and machine learning in neural network toolbox. Figure 3.15 shows the neural network toolbox of MATLAB that is used to train the MLP models.

Figure 3.15: Neural Network Toolbox

(44)

31

Figure 3.16 shows the data sets provided by neural network toolbox of MATLAB for training of neural network. All of these data are provided by UCI Machine Learning Repository.

Figure 3.16: Data set provided by neural network toolbox.

All MLP models are trained using backpropagation algorithm. After that, the trained neural network will be deployed as a MATLAB function using MATLAB genfunction command. This generated function contains all the weights and bias of the trained neural network. After that, these weights and bias will be used to build and complete the MLP models in processor and FPGA using methods mentioned previously. Before testing the performance of these models in processor and FPGA, functional verification will be carried out to make sure these models have been implemented correctly.

(45)

32 The five MLP models are shown below:

a) MLP Model 1

Figure 3.17: MLP model 1 structure

Number of inputs: 3

Number of hidden neurons: 3 Number of output neurons: 2

Activation function of hidden layer: tangent sigmoid Activation function of output layer: tangent sigmoid Data set in MATLAB: Iris Flower

b) MLP Model 2

(46)

33 Number of inputs: 3

Activation function of hidden layer: tangent sigmoid Activation function of output layer: tangent sigmoid Data Set in MATLAB: Iris Flower

c) MLP model 3

Number of inputs: 3

Activation function of hidden layer: tangent sigmoid Activation function of output layer: pure linear Data Set in MATLAB: Iris Flower

(47)

34 d) MLP model 4

Number of inputs: 10

Activation function of hidden layer: tangent sigmoid Activation function of output layer: tangent sigmoid Data Set in MATLAB: Wine Vintage

e) MLP model 5

Figure 3.21: MLP model 5 structure Number of inputs: 3

Number of hidden neurons: 6

(48)

35 Number of output neurons: 2

Activation function of hidden layer: tangent sigmoid Activation function of output layer: tangent sigmoid Data Set in MATLAB: Iris Flower

3.5.1 Measure the accuracy of MLP models in ARM processor

To measure the accuracy of the MLP models executed in ARM processor, each ANN programs will be uploaded to ARM processor for execution. Terminal of Linux will be used to key in input data to ANN and to show the result of ANN. Twenty data are selected from data set used during training of ANN in MATLAB and will be tested for each model.

The results will be compared with results of ANN in MATLAB and Mean Square Error will be computed according to equation 3.1,

𝑀𝑆𝐸 = ¹

𝑛 ∑^𝑛_𝑖=1(𝑦 − ŷ)² (3.1)

where n= total number of data, y= output from MATLAB, ŷ= output from ARM processor, or dividing total differences squared by total number of data. Figure 3.22 shows the Linux terminal.

Figure 3.22: Linux terminal

(49)

36

3.5.2 Measure the execution time of MLP models in ARM processor

To measure the total time used by ARM processor for MLP model, a test program will be uploaded to the ARM processor. In the test program a library called ‘time.h’ of C programming is utilized. In the beginning of the test program, input data will be entered in the Linux terminal. After that a clock will be initialized, timer start counting every 1 micro second and ANN program start. Since the ANN program is very small, and it is difficult to measure and show the time of this small program, the ANN program will be executed 100000 times by using for loop. After the ANN program has been finish executed, timer will stop the counting. The total time used by the ANN program is computed by dividing total clock with clock per second (1 Mega clocks). Measured time will be shown in terminal. This time is the total execution time for 100000 ANN process.

Figure 3.23 shows the flow chart of a test program. This time will then be divided by 100000 to get the execution time of one ANN program.

Figure 3.23: Flow chart of test program.

(50)

37

3.5.3 Measure the accuracy of MLP models in FPGA

The ANN hardware design will not be programmed and run in the Altera DE1-SOC board to get the output of the ANN because to get the output of ANN running in board, extra element needed like input component hardware like sensors and output component hardware like LED and GUI. Adding of these components may affect the result since the focus is only the performance of ANN. Thus, for measurement of accuracy, simulation in ModelSim Altera will be used instead. To measure the accuracy of ANN in FPGA, test bench will be run in ModelSim Altera software to simulate the output of ANN in FPGA.

Twenty data are selected from data set used during training of ANN in MATLAB and will be tested for each model. These Twenty data set are the same as the one in accuracy test of ANN in ARM processor. The output results will be compared with the results of ANN in MATLAB and Mean Square Error will be computed according to equation 3.1, 𝑀𝑆𝐸 = ¹

𝑛 ∑^𝑛_𝑖=1(𝑦 − ŷ)² (3.1)

where n= total number of data, y= output from MATLAB, ŷ= output from FPGA, or dividing total differences squared by total number of data. Figure 3.24 shows the simulation of ModelSim Altera.

Figure 3.24: ModelSim Altera simulation

(51)

38

3.5.4 Measure the execution time of MLP models in FPGA

Same as the accuracy test of ANN model, the ANN hardware design will not be programmed and run in the board to measure the execution time because currently there is no tools or method to measure the execution time of ANN running on board. To measure the execution time of ANN in FPGA, total number of clock used by the ANN will be measured in ModelSim Altera using simulation. After that, timing simulation or time quest analyzer will be run in Quartus software to simulate the total propagation delay of the ANN design and compute the maximum operating frequency (Fmax) that the ANN design can use. After that the total execution time of ANN model will be calculated by multiplying total number of clocks with time per clock (inverse of maximum frequency).

Figure 3.25 shows the example maximum frequency for design models.

Figure 3.25: Maximum frequency summary report

(52)

39

3.5.5 Measure the FPGA hardware resources used by MLP models in FPGA To measure the total FPGA hardware resource used by ANN design, after functional verification, the Verilog design will be compiled in Quartus software and logic synthesis will be carried out. In the end, compilation report will be generated, and information regarding the total hardware resources usage will be shown in the report. Figure 3.26 shows the example of the compilation report generated by Quartus that shows the information regarding the hardware resources used by the design models like logic utilization in ALM or adaptive logic module. Adaptive logic module is basic building block of Cyclone V device.

Figure 3.26: Compilation report

(53)

40 3.6 Performance Analysis

The execution time and accuracy that has been measured for all the five MLP models in ARM processor and the execution time, accuracy and hardware resources utilization that have been measured for all five MLP models in FPGA will be recorded in table and plotted in graph. Execution time and accuracy of the MLP models in ARM processor and in FPGA will be compared in graph too. After that discussion will be made based on the recorded data in table and the data in plotted graph.

3.7 Summary

This chapter discusses about the way and method of implementing the multilayer perceptron neural network in processor using software solution and in FPGA hardware.

Structure of neuron and the way of implementation has been explained briefly. Type of activation function and ways of implementation in software and hardware has also been discussed. Several multilayer perceptron models with different number of inputs, number of hidden neurons and type of activation function has been stated and used for experimental purpose. Experimental procedure that includes testing and measuring the performance of the MLP models in processor and FPGA in terms of execution time, resources utilization and accuracy has been explained. The result and analysis of the experiments will be discussed in Chapter 4.

(54)

41

CHAPTER 4 RESULTS & DISCUSSION

4.1 Overview

After implementation of 5 MLP models in ARM processor and in FPGA using the methods explained in previous chapter, experiments and testing has been carried out to evaluate the performance of implementation of MLP models in ARM processor in terms of accuracy and execution time and to evaluate the performance of implementation of MLP models in FPGA in terms of accuracy, execution time and resources utilization. All results have been tabulated into table and comparison of accuracy and execution time between MLP models in ARM processor and FPGA have been plotted in graph too.

Discussion have been made for the results and problem regarding whether ANN should be implemented in ARM processor or in FPGA have been discussed based on the results.

4.2 Accuracy of MLP models in ARM processor

MLP Model 1

In the Linux terminal as shown in Figure 4.1, firstly data are entered to the ANN and results are shown in the Linux terminal. The rest of the data are also entered and results are tabulated in Table 4.1. Table 4.1 shows the results from MATLAB, results from ARM processor and the difference between the results.

(55)

42

Figure 4.1: Results shown in Linux terminal

Table 4.1: Tabulated results for MATLAB and ARM processor Data

MATLAB result ARM processor result

Difference Difference Squared Output1 Output2 Output1 Output2 1 2 1 2

1 0.9920 0.0001 0.9920 0.0001 0 0 0 0

2 0.9898 -0.0054 0.9898 -0.0054 0 0 0 0

3 0.9919 -0.0024 0.9919 -0.0024 0 0 0 0

4 0.9914 -0.0028 0.9914 -0.0028 0 0 0 0

5 0.9931 0.0010 0.9931 0.0010 0 0 0 0

6 0.9929 0.0027 0.9929 0.0027 0 0 0 0

7 0.9937 -0.0017 0.9937 -0.0017 0 0 0 0

8 0.9916 -0.0016 0.9916 -0.0016 0 0 0 0

9 0.9911 -0.0011 0.9911 -0.0011 0 0 0 0

10 0.9902 -0.0045 0.9902 -0.0045 0 0 0 0

11 -0.0031 0.9966 -0.0031 0.9966 0 0 0 0

12 0.0105 0.9965 0.0105 0.9965 0 0 0 0

13 -0.0022 0.9966 -0.0022 0.9966 0 0 0 0

14 -0.0040 0.9966 -0.0040 0.9966 0 0 0 0

15 -0.0007 0.9966 -0.0007 0.9966 0 0 0 0

16 0.0004 0.9966 0.0004 0.9966 0 0 0 0

17 -0.0050 0.9966 -0.0050 0.9966 0 0 0 0

18 0.0001 0.9966 0.0001 0.9966 0 0 0 0

19 0.0127 0.9966 0.0127 0.9966 0 0 0 0

20 0.0035 0.9966 0.0035 0.9966 0 0 0 0

Total difference squared 0 0

For model 1, the total difference squared is zero for both output, so the MSE is zero for model 1 or 100 % of accuracy.

(56)

43 MLP model 2

The table of tabulated results for model 2 is shown in Appendix A.1. The total difference squared of model 2 is zero for both output. Thus, the MSE is zero for model 2 or 100%

of accuracy.

MLP model 3

The table of tabulated results is shown in Appendix A.2. The total difference squared of model 3 is zero for both output. Thus, the MSE is zero for model 3 or 100% of accuracy.

MLP model 4

The table of tabulated results is shown in Appendix A.3. The total difference squared of model 4 is zero for both output. Thus, the MSE is zero for model 4 or 100% of accuracy.

MLP model 5

The table of tabulated results is shown in Appendix A.4. The total difference squared of model 5 is also zero for both output. Thus, the MSE is also zero for model 5 or 100% of accuracy.

4.3 Accuracy of MLP models in FPGA MLP model 1

After running simulation using test bench in ModelSim Altera, the results of first data to ANN are simulated as shown in Appendix A.5. The results were shown in 16 bits fixed point format. After that, the results are converted to decimal number and recorded in Table 4.2. Output1 shows 0000001111110101 which is equal to 0.9893 and output 2 shows 1000000000000000 which is equal to 0. The rest of the results are tabulated in

(57)

44

Table 4.2. Table 4.2 shows the results from MATLAB and FPGA. Besides, Table 4.2 also shows the total difference squared between the results from MATLAB and FPGA.

Table 4.2: Tabulated results from MATLAB and FPGA Data

MATLAB result FPGA result Difference Difference Squared (10^-6)

Output1 Output2 Output1 Output2 1 2 1 2

1 0.9920 0.0001 0.9893 -0.0000 0.0027 0.0001 7.29 0.01 2 0.9898 -0.0054 0.9873 -0.0068 0.0025 0.0014 6.25 1.96 3 0.9919 -0.0024 0.9893 -0.0039 0.0026 0.0015 6.76 2.25 4 0.9914 -0.0028 0.9893 -0.0049 0.0021 0.0021 4.41 4.41 5 0.9931 0.0010 0.9912 0.0000 0.0019 0.0010 3.61 1.00 6 0.9929 0.0027 0.9912 0.0009 0.0017 0.0018 2.89 3.24 7 0.9937 -0.0017 0.9922 -0.0029 0.0015 0.0012 8.41 1.44 8 0.9916 -0.0016 0.9893 -0.0029 0.0023 0.0013 5.29 1.69 9 0.9911 -0.0011 0.9893 -0.0068 0.0018 0.0057 4.62 32.5 10 0.9902 -0.0045 0.9883 -0.0059 0.0019 0.0014 3.61 1.96 11 -0.0031 0.9966 -0.0003 1.0000 -0.0028 -0.0034 7.84 11.6 12 0.0105 0.9965 0.0117 1.0000 -0.0012 -0.0035 1.44 12.3 13 -0.0022 0.9966 -0.0029 1.0000 0.0007 -0.0034 0.49 11.6 14 -0.0040 0.9966 -0.0039 1.0000 -0.0001 -0.0034 0.01 11.6 15 -0.0007 0.9966 -0.0029 1.0000 0.0022 -0.0034 4.84 11.6 16 0.0004 0.9966 -0.0029 1.0000 0.0033 -0.0034 10.9 11.6 17 -0.0050 0.9966 -0.0039 1.0000 -0.0011 -0.0034 1.21 11.6 18 0.0001 0.9966 -0.0029 1.0000 0.0030 -0.0034 9.00 11.6 19 0.0127 0.9966 0.0186 1.0000 -0.0059 -0.0034 34.8 11.6 20 0.0035 0.9966 0.0029 1.0000 0.0006 -0.0034 0.36 11.6

Total difference square 124.03 167.16

The total difference squared for output 1 is 124.03 x 10^-6and output 2 is 167.16 x 10^-6. Thus, the average total difference squared is (124.03 x 10^-6+ 167.16 x 10^-6) /2 = 1.46 x 10^-4. MSE is equal to 1.46 x 10^-4 divided by 20 = 7.3 x 10^-6. The MSE for model 1 in FPGA is 7.3 x 10^-6.

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

LIM CHUN MING

UNIVERSITI SAINS MALAYSIA

2017

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

by

LIM CHUN MING

Thesis submitted in partial fulfillment of the requirements for the degree of

Bachelor of Engineering (Electronic Engineering)

JUNE 2017

ACKNOWLEDGEMENTS

TABLE OF CONTENTS

LIST OF TABLE

LIST OF FIGURES

LIST OF ABBREVIATIONS

PERLAKSANAAN PERKAKASAN DAN PERISIAN RANGKAIAN NEURAL TIRUAN DALAM ALTERA DE1-SOC

ABSTRAK

HARDWARE AND SOFTWARE IMPLEMENTATION OF ARTIFICIAL NEURAL NETWORK IN ALTERA DE1-SOC

ABSTRACT

CHAPTER 1 INTRODUCTION

CHAPTER 2

LITERATURE REVIEW

CHAPTER 3 METHODOLOGY

− 1

CHAPTER 4

RESULTS & DISCUSSION