DISSERTATION SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER SCIENCE DEGREE

Tekspenuh

(1)M. al ay. a. ACTIVITY RECOGNITION USING OPTIMIZED REDUCED KERNEL EXTREME LEARNING MACHINE (OPT-RKELM). U. ni. ve. rs i. ty. of. YANG DONG RUI. FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR 2019.

(2) M. YANG DONG RUI. al ay. a. ACTIVITY RECOGNITION USING OPTIMIZED REDUCED KERNEL EXTREME LEARNING MACHINE (OPT-RKELM). ve. rs i. ty. of. DISSERTATION SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF COMPUTER SCIENCE DEGREE. U. ni. FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR. 2019.

(3) UNIVERSITI MALAYA ORIGINAL LITERARY WORK DECLARATION. Name of Candidate: YANG DONGRUI Registration/Matric No.: Name of Degree:. MASTER OF COMPUTER SCIENCE. Field of Study: I do solemnly and sincerely declare that:. M al ay. ACTIVITY RECOGNITION USING OPTIMIZED REDUCED KERNEL EXTREME LEARNING MACHINE(OPT-RKELM). a. Title of Project Paper/Research Report/Dissertation/Thesis (“this Work”):. U. ni. ve. rs i. ty. of. (1) I am the sole author/writer of this Work; (2) This work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the Work and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes an infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM.. Candidate’s Signature. Date:. Subscribed and solemnly declared before,. Witness’s Signature. Date:. Name: Designation:. ii.

(4) ACTIVITY RECOGNITION USING OPTIMIZED REDUCED KERNEL EXTREME LEARNING MACHINE (OPT-RKELM) ABSTRACT. In the past decade, research related to Human Activity Recognition (HAR) based on devices. a. embedded sensors has shown good overall recognition performance. As a consequence,. al ay. HAR has been identified as a potential topic for healthcare assessment systems. One of the major research problems is the computation resources required by machine learning algorithm used for classification for HAR. Numerous researchers have tried different. M. methods to enhance the algorithm to improve performance, some of these methods. of. include Support Vector Machine (SVM), Decision Trees, Extreme Learning Machine (ELM), Kernel Extreme Learning Machine (KELM), and Deng’s Reduced Kernel Extreme. ty. Learning Machine (RKELM). However, unsatisfactory accuracy, slow learning speed, and. rs i. stability is still a problem. In this study, we have purposed a model named as Optimized. ve. Reduced Kernel Extreme Learning Machine (Opt-RKELM). It applies the characteristic of ReliefF algorithm to rank and select top scoring features for feature selection. ReliefF can. ni. solve the problem of large feature dimension in the existing RKELM. By using clustering. U. method K-Means, we have found the best center point position to calculate Kernel matrix. at last, we have employed Quantum-behaved Particle Swarm Optimization (QPSO) to get the optimal kernel parameter in the proposed model. To evaluate the effectiveness of Opt-RKELM, two benchmark datasets related to human activity recognition problems are used. The notable advantages of the proposed model are excellent recognition accuracy, fast learning speed, stable prediction ability, and good generalization ability. Keywords: Human Activity Recognition, Extreme Learning Machine, ReliefF, K-Means,. iii.

(5) Optimized Reduced Kernel Extreme Learning Machine, Quantum-behaved Particle Swarm. U. ni. ve. rs i. ty. of. M. al ay. a. Optimization.. iv.

(6) ABSTRAK. Dalam dekad yang lalu, penyelidikan yang berkaitan dengan Human Activity Recognition (HAR) berdasarkan peranti sensor terbenam telah menunjukkan prestasi pengiktirafan keseluruhan yang baik. Sebagai akibatnya, HAR telah dikenal pasti sebagai topik yang. a. berpotensi untuk sistem penilaian kesihatan. Satu masalah penyelidikan utama adalah. al ay. sumber perhitungan yang diperlukan oleh machine learning algoritma yang digunakan untuk klasifikasi HAR. Pelbagai penyelidik telah mencuba kaedah untuk meningkatkan algoritma untuk meningkatkan prestasi. Beberapa kaedah ini termasuk Support Vector. M. Machine (SVM), Decision Trees, Extreme Learning Machine (ELM), Kernel Extreme. of. Learning Machine (KELM), dan Deng’s Reduced Kernel Extreme Learning Machine (RKELM). Walau bagaimanapun, kerumitan komputasi tinggi, pembelajaran perlahan. ty. kelajuan, dan kestabilan masih menjadi masalahnya. Dalam kajian ini, kami telah membuat. rs i. satu model bernama Optimized Reduced Kernel Extreme Learning Machine (Opt-RKELM).. ve. Ia terpakai ciri ReliefF Algorithm untuk menentukan dan memilih ciri pemarkahan atas untuk ciri pemilihan. ReliefF dapat menyelesaikan masalah dimensi besar dalam RKELM. ni. yang sedia ada. Dengan menggunakan kaedah clustering K-Means, kami telah menemui. U. kedudukan titik pusat terbaik untuk hitung matriks Kernel. Pada yang terakhir, kami telah menggunakan Quantum-behaved Particle Swarm Optimization (QPSO) untuk mendapatkan parameter kernel optimum dalam model yang dicadangkan. Dua kumpulan penanda aras yang berkaitan dengan Human Activity Recognition digunakan untuk menilai keberkesanan Opt-RKELM. Kelebihan dari model yang dicadangkan adalah prestasi pengiktirafan yang baik, kos pengiraan yang rendah, kelajuan pembelajaran yang pantas, stabil keupayaan ramalan, kurang ketergantungan parameter, dan keupayaan generalisasi yang baik.. v.

(7) Keywords:. Human Activity Recognition of Daily Living, Reduced Kernel Extreme. Learning Machine, ReliefF, K-Means, Optimized Reduced Kernel Extreme Learning. U. ni. ve. rs i. ty. of. M. al ay. a. Machine, Quantum Particle Swarm Optimization.. vi.

(8) ACKNOWLEDGEMENTS. I would like to express my deepest appreciation to my supervisor, Prof Dr. Loo Chu Kiong for his guidance, patience and many invaluable comments and suggestions throughout the whole Master’s study and related research. His guidance helped me in all the time of research and writing of this dissertation. I could not have imagined having a better supervisor and mentor for my Master’s study.. al ay. a. I want to acknowledge University of Malaya and also thanks my friends and laboratory members for their insightful comments and encouragement, but also for the hard questions which inspired me to widen my research from various perspectives.. M. Finally, I must express my very profound gratitude to my family especially my parents for providing me with unfailing support and continuous encouragement throughout my. of. years of study and through the process of researching and writing this dissertation. This. U. ni. ve. rs i. ty. accomplishment would not have been possible without them. Thank you.. vii.

(9) TABLE OF CONTENTS. Abstract ......................................................................................................................... iii Abstrak ........................................................................................................................... v. Acknowledgements ....................................................................................................... vii Table of Contents .......................................................................................................... viii List of Figures ............................................................................................................... xi. al ay. a. List of Tables................................................................................................................. xiii List of Symbols and Abbreviations............................................................................... xiv. M. List of Appendices ........................................................................................................ xv. 1. 1.1. Background ........................................................................................................... 1. 1.2. Problem statements ............................................................................................... 2. 1.3. Research Objective................................................................................................ 1.4. Proposed Method .................................................................................................. 3. 1.5. Research Contributions ......................................................................................... 4. 1.6. Thesis Outline ....................................................................................................... 5. 3. U. ni. ve. rs i. ty. of. CHAPTER 1: INTRODUCTION ............................................................................. CHAPTER 2: BACKGROUND AND LITERATURE REVIEW .......................... 7. 2.1. Introduction........................................................................................................... 7. 2.2. Classification Methods......................................................................................... 12. 2.3. 2.2.1. Basic Algorithm of Extreme Learning Machine Algorithm ................... 12. 2.2.2. Kernel Extreme Learning Machine Algorithm ....................................... 15. Pre-processing...................................................................................................... 17. viii.

(10) 2.4. 2.3.1. Feature Selection ..................................................................................... 17. 2.3.2. Clustering Method................................................................................... 19. Optimization Method ........................................................................................... 19 2.4.1. Particle Swarm Optimization Algorithm................................................. 20. Research Gaps...................................................................................................... 21. 2.6. Summary.............................................................................................................. 22. a. 2.5. al ay. CHAPTER 3: METHODOLOGY ........................................................................... 24 Introduction.......................................................................................................... 24. 3.2. System Overview ................................................................................................. 25. 3.3. Preprocessing Methods ........................................................................................ 26 Feature Selection - ReliefF Algorithm .................................................... 26. 3.3.2. Clustering Method - K-Means Algorithm .............................................. 29. of. 3.3.1. 3.4.1. ty. Classification Method .......................................................................................... 32 Extreme Learning Machine with Reduced Kernel Algorithm ................ 33. rs i. 3.4. M. 3.1. Optimization - Quantum-behaved Particle Swarm Optimization Algorithm ...... 35. 3.6. Summary.............................................................................................................. 38. ni. ve. 3.5. U. CHAPTER 4: EXPERIMENTS............................................................................... 39 4.1. Introduction.......................................................................................................... 39. 4.2. Datasets ................................................................................................................ 39 4.2.1. Human Activity Recognition Using Smartphones Dataset .................... 40. 4.2.2. Human Activities and Postural Transitions Dataset ................................ 42. 4.3. Environment......................................................................................................... 43. 4.4. Process with Dataset ............................................................................................ 43. ix.

(11) 4.5. 4.4.1. Data Normalization ................................................................................. 43. 4.4.2. Data Split................................................................................................. 44. 4.4.3. Features ................................................................................................... 44. Performance Measures......................................................................................... 44 4.5.1. Summary.............................................................................................................. 47. a. 4.6. Accuracy.................................................................................................. 46. al ay. CHAPTER 5: RESULT AND DISCUSSION ......................................................... 48 Introduction.......................................................................................................... 48. 5.2. Performance Comparison..................................................................................... 48. 5.3. Classification Report............................................................................................ 50. 5.4. Confusion Matrix................................................................................................. 51. 5.5. Review Contribution ............................................................................................ 51. 5.6. Summary.............................................................................................................. 52. rs i. ty. of. M. 5.1. ve. CHAPTER 6: CONCLUSION AND FUTURE WORKS ...................................... 54 Conclusions.......................................................................................................... 54. 6.2. Future Works........................................................................................................ 55. ni. 6.1. U. References ..................................................................................................................... 56 Appendices.................................................................................................................... 65. x.

(12) LIST OF FIGURES. Figure 2.1: Example of a commercial smartphone and some of its features. .............. 8. Figure 2.2: ELM Architecture. L Random Hidden Neurons (which need not be algebraic sum based) or other ELM feature mappings. Different type of output functions could be used in different neurons, d are Input Nodes 14. a. Figure 3.1: Overview flow of Opt-RKELM................................................................ 25. al ay. Figure 3.2: Human Activity Recognition Using Smartphones Dataset from the ReliefF algorithm sorted by relevance. The higher features are more. M. important. ................................................................................................ 27 Figure 3.3: Human Activities and Postural Transitions Dataset from ReliefF. of. algorithm sorted by relevance. The higher features are more. ty. important. ................................................................................................ 28 Figure 3.4: Pseudo code of ReliefF algorithm. .......................................................... 29. rs i. Figure 3.5: Flowchart of K-Means .............................................................................. 31. ve. Figure 3.6: Flowchart of QPSO. ................................................................................. 36 Figure 3.7: Using QPSO to optimize error rate, after 10 iterations until 50. U. ni. iterations stabled at lower than 0.01 error rate ........................................ 38. Figure 4.1: Overview of Activities.............................................................................. 39 Figure 4.2: Example of the performed basic activities during the collection of experimental data. From left to right and top to bottom: standing, sitting, lying down, walking, walking downstairs and walking upstairs... 41 Figure 4.3: Postural Transitions activities: stand to sit, sit to stand, stand to lie, lie to sit, sit to lie, lie to stand. ................................................................ 43. xi.

(13) Figure 5.1: Confusion Matrix Heatmap for HARUS Dataset .................................... 52. U. ni. ve. rs i. ty. of. M. al ay. a. Figure 5.2: Confusion Matrix Heatmap for HAPT Dataset ....................................... 53. xii.

(14) LIST OF TABLES. Table 4.1: HARUS datasets information..................................................................... 42 Table 4.2: List of measures for computing feature vectors. ........................................ 45 Table 4.3: The four fundamental numbers for estimating statistical performance measures of a classifier .............................................................................. 46. a. Table 5.1: Results for HARUS datasets. ..................................................................... 48. al ay. Table 5.2: Results for HAPT datasets. ........................................................................ 48 Table 5.3: Classification Report for HARUS datasets................................................. 50. M. Table 5.4: Class ID from A1 -A6 are: walking, walking upstairs, walking downstairs, sitting, standing, laying. .......................................................... 50. of. Table 5.5: Classification Report for HAPT datasets. .................................................. 50. ty. Table 5.6: Class ID from A1 -A12 are: walking, walking upstairs, walking downstairs, sitting, standing, laying, stand to sit, sit to stand, sit to lie,. rs i. lie to sit, stand to lie, lie to stand................................................................ 50. ve. Table 5.7: Confusion Matrix for HARUS datasets...................................................... 51 Table 5.8: Class ID from A1 -A6 are: walking, walking upstairs, walking. ni. downstairs, sitting, standing, laying. .......................................................... 51. U. Table 5.9: Confusion Matrix for HAPT datasets. ....................................................... 53 Table 5.10: Class ID from A1 -A12 are: walking, walking upstairs, walking downstairs, sitting, standing, laying, stand to sit, sit to stand, sit to lie, lie to sit, stand to lie, lie to stand................................................................ 53. xiii.

(15) LIST OF SYMBOLS AND ABBREVIATIONS. K-Means : k-NN : KELM : Opt-RKELM :. :. U. ni. ve. rs i. SVM. : : : : :. ty. PSO QPSO ReliefF RKELM SLFN. a. : :. al ay. HAR HARUS. Ant Colony Optimization. Back-Propagation Neural Network. Extreme Learning Machine. Feedforward Neural Network. Genetic Algorithm. Smartphone-Based Recognition of Human Activities and Postural Transitions. Human Activity Recognition. Human Activity Recognition Using Smartphones. K-Means. k-Nearest Neighbor. Kernel Extreme Learning Machine. Optimized Reduced Kernel Extreme Learning Machine. Particle Swarm Optimization. Quantum-behaved Particle Swarm Optimization. ReliefF. Reduced Kernel Extreme Learning Machine. Single Hidden Layer Feedforward Neural Network. Support Vector Machine.. M. : : : : : :. of. ABCO BPNN ELM FNN GA HAPT. xiv.

(16) LIST OF APPENDICES. Code for Opt-RKELM ......................................................................... 65. U. ni. ve. rs i. ty. of. M. al ay. a. Appendix A:. xv.

(17) CHAPTER 1: INTRODUCTION. 1.1. Background. Human activity recognition includes a series of limb movements with rich meanings. It is a way of expressing people’s behavioral intentions or completing the information transmission between people and the environment. Human activity recognition refers to the process of computer automatic detection, analysis and understanding of various sports and. al ay. a. behaviors of the human body, and even take actions or responses. It has broad application prospects in human-computer interaction, rehabilitation engineering, education, game, teleconference, sports, healthcare, elderly care, and personalized recommendation.. M. A number of activity identification methods have been established that use specific purposes hardware devices such as Dijkstra, Kamsma, and Zijlstra (2010) or sensor body. of. networks (Mannini & Sabatini, 2010; Altun, Barshan, & Tunçel, 2010). Although the. ty. use of numerous sensors can improve the performance of recognition algorithms, it is. rs i. unrealistic to expect the general public to use them for everyday activities due to the difficulty and time required to wear them.. ve. With the development of microelectronics systems, accelerometers have been minia-. ni. turized so that they can be embedded in small mobile devices, This integration affords. U. researchers with a platform for the classification of a group of daily activities such as standing, walking, laying, walking, walking up, and downstairs. In this process, the inertial body signal is processed using a supervised machine learning algorithm of limited resource hardware. In recent years, with the rapid development of smartphone technology and the maturity of sensor technology, the application of sensor technology has become more and more extensive, and the role played by activity recognition has become increasingly important.. 1.

(18) For example, sensor technology can be easily applied to smart home environments: activity reminders, fall detection, rehabilitation guidance, health assessment, and more. In health assessment, a person’s ability to perform activities in daily life is usually closely related to his or her health. For example, people with Alzheimer’s disease are characterized by sitting or lying for a long time and having sleep disorders. Therefore, capturing the location of these activities, the duration, frequency of occurrence, etc. can be used to preliminarily. a. infer whether the active person has a certain condition. Compared to traditional wearable. al ay. activity recognition (Mital, Smith, Hill, & Henderson, 2011), the development of an activity recognition application using a smartphone has several advantages, such as device. 1.2. M. portability, no additional fixed equipment, and lack of discomfort.. Problem Statements. of. One drawback of the smartphone-based approach is that energy and services on the. ty. mobile phone are shared with other applications. This problem becomes critical in mobile. rs i. phone devices which have limited resources. Y. Chen, Zhao, Wang, and Chen (2012) uses the ELM algorithm based fast and robust human activity recognition model. Huang, Zhou,. ve. Ding, and Zhang (2012) proposed KELM, can achieve very high accuracy in classification. ni. problem. W.-Y. Deng, Zheng, and Wang (2014) introduced RKELM as one solution to. U. this problem. RKELM solves the problem of huge computation of Kernel by randomly selecting 10% of the input data to calculate the Kernel Matrix. This process was able to reduce the use of computational resources, and computational time. But, randomly selecting the 10% input data also creates an unstable issue that reduces the recognition rate. Because activity datasets can sometimes have a lot of data. If these data are not properly selected for classifier, the activity recognition accuracy will be severely degraded. Furthermore, optimal parameter choice is required for the RKELM model to recognize activity with high accuracy. In view of this, the following research problems have been 2.

(19) identified as follows:. • Less relevant features reduce accuracy and slow down the performance Opt-RKELM in activity recognition. • The random sampling used in RKELM degrades the overall performance of the activity recognition model. • Kernel parameter dependency problem of the RKELM affects the performance of. al ay. 1.3. a. the model.. Research Objective. Base on the problem statements list in section 1.2, the research objective of this study. M. are given as follows:. of. • To reduce high-dimensional feature space from large datasets by using an appropriate. ty. feature selection method ReliefF.. rs i. • To replace random sampling in the RKELM by using an unsupervised machine learning technique K-Means in order to solve the lack of deterministic of classification.. ve. • To determine the best parameter for kernel using an optimization technique Quantum-. ni. behaved Particle Swarm Optimization.. U. The three objectives listed above are matched with the three problem statements. respectively. Therefore, the main aim of this study is to implement a system called. Optimized Reduced Kernel Extreme Learning Machine (Opt-RKELM) for Human Activity Recognition. 1.4. Proposed Method. The proposed method will be split into three stages, pre-processing stage, machine learning stage and optimization stage. In the first stage is the pre-processing stage. There. 3.

(20) have two steps in the preprocessing stage. The first step is using the ReliefF to select the important features from the available input features. After these input features had been filtered out, the next step is to perform the clustering method by using k-mean. This clustering process can find out the important data points for the Kernel matrix function calculation. After the pre-processing stage, the next stage is the classification stage. Opt-RKELM. a. used to train and recognize the human activities. compare to Deng’s RKELM, the input. al ay. data points for the kernel matrix function calculation will be replaced by the k-mean data points. By doing this replacement able to reduce the training speed and improve. M. the recognition stability. The final stage is the optimization stage. In the optimization stage, QPSO is proposed and implemented to adjust the RKELM parameters. By doing. Research Contributions. rs i. 1.5. ty. recognition rate.. of. this, it able to solve the RKELM parameter dependencies problems and improve the high. The contributions of this thesis can be summarized as follow: In the first place,. ve. an Optimized-Reduced Kernel Extreme Learning Machine (Opt-RKELM) for Human. ni. Activity Recognition (HAR) system was proposed. To achieving this, at-first, ReliefF. U. were implemented to select the relevant and important features from the available input features. These input features are used to train the Reduce Kernel-Extreme Learning Machine (RKELM) for the human activity recognition. In order to speed up the training process and increase the stability of the recognizing result, another level of pre-processing was proposed and implemented. Before the input features pass to the RKELM for training, these input features will go through a clustering method, K-mean, to select the important data to build-up the kernel matrix in the RKELM. By doing this, the training process will become more faster and stable. In-order to optimize 4.

(21) the proposed system and solve the RKELM parameter dependencies issues, QPSO were proposed and implemented. To validate the proposed Opt-RKELM, two benchmark datasets were selected for validation. Experiments have shown that the proposed Opt-RKELM model can achieve better performance than other existing models. As a summary, the main contribution for this thesis is the proposed Opt-RKELM model. a. is performs better than the RKELM model (W.-Y. Deng et al., 2014), and use even less. al ay. training time. Opt-RKELM model can run very fast using less computation resource. This research is prepare for produce a Journal publication and a draft is being prepared under. M. the title "Activity recognition using optimized reduced kernel extreme learning machine. 1.6. Thesis Outline. of. (Opt-RKELM)".. ty. The rest of the thesis is organized as follows: This thesis is organized into six parts:. rs i. Introduction and Literature Review, Methodology and Experiments, Result and Discussion, and Conclusion and Future Works.. ve. Chapter 2 Literature Review presents an overview of basic concepts, reviews the. ni. fundamentals of a Human Activity Recognition system including the data-processing,. U. feature extraction, classification, and optimization technique. Furthermore, a summary of comparisons between several previous research works is presented. Chapter 2 presents the reader with useful background knowledge about a Human Activity Recognition. Chapter 3 Methodology formally defines the architecture of the proposed Opt-RKELM system. It presents studies that fulfill the research problems proposed in section(1.2), Starting with the data collection procedure, feature extraction methods and data preprocessing are presented. Then, the classification is present, finally, the effect of parameters is used QPSO to find best fit value. 5.

(22) Chapter 4 Experiments presents the implementation of the experiments, including the explanation of the of the available database and the tools used to implement the Opt-RKELM system. Chapter 5 Result and Discussion presents and discusses the experiment results conducted through the experiments including the comparison with others classifier. In the end, Conclusions and suggests some Future research directions of this work are. U. ni. ve. rs i. ty. of. M. al ay. a. presented in Chapter 6 Conclusion and Future Works.. 6.

(23) CHAPTER 2: BACKGROUND AND LITERATURE REVIEW. 2.1. Introduction. In the past few years, activity recognition has become an emerging field of research and one of the challenges for pervasive computing. Activity recognition researchers have explored different types of sensing technologies. In general, sensor technology can be broadly classified into three categories: (1) vision-based methods(Jalal, Uddin, & Kim,. al ay. a. 2012); (2) ambient sensor-based methods(Ning, Shi, Zhu, Li, & Chen, 2019); and (3) wearable sensor-based methods(A. Wang, Chen, Yang, Zhao, & Chang, 2016). The vision-based methods primarily use image or video to monitor and identify. M. different activities. In well-controlled environment, the correct rate of vision-based activity recognition is relatively high, but in many special environments, such technologies. of. have serious privacy problems, and this method is often limited by lighting changes,. ty. environmental occlusion, and background changes (Chaquet, Carmona, & Fernández-. rs i. Caballero, 2013; Aggarwal & Xia, 2014; Hernández, Cabido, Montemayor, & Pantrigo, 2014; Yin, Tian, Feng, & Li, 2014).. ve. Based on ambient sensor-based methods to determine activity by capturing the interaction. ni. between human and object. For example, if a sensor embedded in a chair is triggered, it. U. is inferred that the monitored person may be sitting in a chair; If it is detected that the sensor placed on the bed is active for a long time, it is inferred that the monitored person may be in a sleep state. This method does identify human activities in daily life, such as washing, eating, sleeping, and watching TV, but this method is usually expensive and limited to indoor scenes (L. Chen & Khalil, 2011, 2011). In addition, for the entire system to work properly, the establishment and maintenance of the system are relatively complex, such as how to effectively deploy these sensors and devices in appropriate places without. 7.

(24) inconvenience to users. Compared to the above two methods, the wearable sensor-based method is very flexible. Wearable sensors can be adapted to many parts of the body, such as the head, arms, wrists, legs, ankles, or even pockets. They are suitable for both indoor and outdoor environments, and can be worn with multiple devices at the same time. Among these wearable devices, smartphone are particularly powerful, with most of the additional sensing components. a. available to users, and with powerful computing and communication capabilities. However,. al ay. there are still many difficulties to be solved in how to learn a large number of samples, let the computer learn faster, and finally accurately identify at high speed. Therefore,. M. smartphone sensor activity recognition is a valuable and challenging area of research.. U. ni. ve. rs i. ty. of. Figure 2.1 shows an smartphone and lists some of its features.. Figure 2.1: Example of a commercial smartphone and some of its features.. Among the development of micro-electrical mechanical systems (MEMS), the sensors including accelerometers ,gyroscope ,and magnetometer are miniaturized that can be 8.

(25) embedded into smartphone. This is why activity recognition has become an increasingly popular topic in recent years. Many studies focus on activity recognition software built for mobile applications (Brezmes, Gorricho, & Cotrina, 2009; L. Sun, Zhang, Li, Guo, & Li, 2010; Anjum & Ilyas, 2013). Mobile accelerometer-based activity recognition(Taylor, 2009) has received much attention for its wide range of applications in healthcare, personalized recommendations,. a. advertising services, etc.(Cambria & Hussain, 2012; Mital et al., 2011; Wöllmer, Eyben,. al ay. Graves, Schuller, & Rigoll, 2010) In the work of Ravi, Dandekar, Mysore, and Littman (2005) and Ward, Lukowicz, and Gellersen (2011), the authors used multiple accelerometers. M. to classify different activities.. Y. Chen, Qi, Sun, and Ning (2010)used smartphones to detect six activities in order to. of. find a state switching point. These models can achieve high recognition accuracy because their test and training samples are from the same batch of samples and follow the same. ty. distribution.. rs i. Kwapisz, Weiss, and Moore (2011) collected and labeled daily activities, including. ve. walking, jogging, stairs, standing, sitting in 29 subjects. They discussed features of each activity and compared performances of using J48, Logistic Regression, Multilayer. ni. Perceptron and Straw Man with 10-fold validation. The features they selected are mean,. U. standard deviation, average absolute difference, average resultant acceleration, the time between peaks and binned distribution. It must be wise to choose features, because the features quality can affect performance. Cheng et al. (2013) proposed a novel active data preprocessing method in their work.. The collected accelerometer readings are first converted to a fixed Earth coordinate system using the gravity vector collected from the sensors. A 10Hz low pass filter is then applied to eliminate noise and improve the quality of the signal collected by the phone that is. 9.

(26) loosely attached to the body. Murad and Pyun (2017) Adopting deep learning methods for human activity recognition, propose the use of deep recurrent neural networks (DRNNs) for building recognition models. In experimental, compared with ELM (Kumar, Bharadwaj, Sumukha, & George, 2016), CNN (Jiang & Yin, 2015), SVM (Anguita, Ghio, Oneto, Parra, & Reyes-Ortiz, 2013), random forest(Murad & Pyun, 2017), HMMs (Zappi et al., 2008), DBNS (Murad. a. & Pyun, 2017), and k-nearest neighbors (KNN)(Hammerla, Kirkham, Andras, & Ploetz,. al ay. 2013) with several databases include HARUS, achieved 96.7% of accuracy.. Traditional neural network learning algorithms, such as the most representative Back-. M. Propagation Neural Network (BPNN) algorithm in the Single Hidden Layer Feedforward Neural Network (SLFN), In the learning process, a large number of network training. of. parameters need to be set to continuously adjust the weight and threshold of the network, so that the square error of the network is minimized, and the training time is often several. ty. hours or even several days. In this regard, in 2004, Professor Huang GuangBin developed. rs i. an ELM algorithm for solving single hidden layer neural networks (Huang, Zhu, & Siew,. ve. 2006). The so-called “Extreme” refers to the limitations of transcending traditional artificial learning methods, breaking the barrier between traditional artificial learning. ni. and biological learning mechanisms, and learning from higher-level brains (Huang et al.,. U. 2012). The ELM algorithm is based on neural network generalization theory, control theory, matrix theory and linear system theory, and represents a set of machine learning theories that do not need to adjust hidden layer nodes. The presentation of ELM is of epoch-making significance. Because for decades, there has been a controversial issue in the fields of neural networks, machine learning, and neuroscience: Whether the hidden layer nodes need to be adjusted during the learning process, ELM theory proves that for most neural networks and learning algorithms, it is. 10.

(27) not necessary to iteratively adjust the hidden layer nodes. In addition, unlike traditional neural networks, ELM only needs to set the number of hidden layer nodes of the neural network throughout the learning process, without adjusting the input weight of the network and the bias of the hidden layer nodes, in order to ensure high learning accuracy, The speed of learning has been greatly improved. In the past few years, ELM has demonstrated excellent learning accuracy and speed. a. in many applications due to its outstanding structure, fast training, strong generalization. al ay. ability and strong general classification ability. Such as face recognition (Mohammed, Minhas, Wu, & Sid-Ahmed, 2011), image segmentation (Pan, Park, Yang, & Yoo, 2012),. M. human activity recognition (Minhas, Baradarani, Seifzadeh, & Wu, 2010). Different from the traditional neural network model, ELM only requires the infinite order of the activation. of. function. The input weight of the hidden layer node and the hidden layer “bias” in the ELM network are randomly generated, and it is not necessary to iteratively adjust these. ty. parameters during the execution of the algorithm. The output weight of the network can. rs i. be solved by a single calculation.. ve. ELM, due to its efficacy, has drawn a significant amount of interest from researchers in various fields, such as face recognition (FR) (Mohammed et al., 2011; Zong & Huang,. ni. 2011), handwritten character recognition (Chacko, Krishnan, Raju, & Anto, 2012), action. U. recognition (Iosifidis, Tefas, & Pitas, 2013), and activity recognition (W.-Y. Deng et al., 2014). Different versions of improved ELM have been proposed. Inspired by Mercer condition, a kernel ELM (KELM) was proposed for robust classification (Huang et. al., 2012). (L. Zhang & Zhang, 2017) compared different versions of ELM perform classification, KELM shows an obvious superiority with 5% improvement in recognition compared with the conventional ELM.. 11.

(28) 2.2. Classification Methods. Feedforward Neural Network (FNN), especially BPNN, have been widely used in recent years. However, traditional BPNN has a slow convergence rate and convergence to local algorithms. In essence, the parameter is an optimization problem in order gradient method, It has a slow convergence speed and convergence to a local minimum problem. Even improved FNN have faster training speeds and better generalization capabilities than. a. BPNN, but they still have not to get the global optimal solution.. al ay. In order to solve the above problems, Huang, Zhu, and Siew (2004) proposed the ELM method to train the SLFN in 2004. This chapter mainly introduces the main ideas and. M. develop algorithms of Extreme Learning Machine, KELM and RKELM.. In ELM, hidden nodes are initialized randomly and are not adjusted during the entire. of. training process. The only parameter that needs to be learned is the weight between the hidden layer and the output layer. In this way, ELM can be attributed to a parametric linear. ty. model that solves linear classification problems. Compared with the traditional FNN, ELM. Basic Algorithm of Extreme Learning Machine Algorithm. ve. 2.2.1. rs i. is more efficient and tends to get the best solution in the whole situation.. ni. Although the neural network has been widely used in different fields, the traditional. U. feedforward network also has its own advantages, but it also has the following disadvantages: It takes a long time to repair the weights and thresholds in network training. The use of gradient descent has the drawback of falling into local minimum and thus does not reach the global minimum. The ELM algorithm greatly improves the learning speed of the SLFN and is widely used in various fields. In recent years, extreme learning machines have gained more and more scholars’ research by virtue of their fast learning speed and good learning effect. Extreme Learning 12.

(29) Machine, a fast learning algorithm for SLFN, originally proposed by Huang et al. (2004), ELM randomly generates hidden node parameters, then, analyzes and determines output weight instead of iterative adjustment. As a result, ELMs run fast, are easy to implement, and appear to outperform other classifiers. ELM diagram shown in Figure 2.2, red curly brackets can apply Feature learning, Clustering, Regression, and Classification. The current research mainly focuses on the improvement and development of ELM’s own. a. algorithm theory and the expansion of ELM practical application field. The practical. al ay. application range of ELM has been expanding in recent years and has been widely used in time series prediction, sales forecasting(Z.-L. Sun, Choi, Au, & Yu, 2008), face. M. recognition(Mohammed et al., 2011), power system, control engineering, fault diagnosis and prediction, data analysis, large-scale data processing, and image processing. The. of. achievements in the theoretical research of ELM algorithm are mainly based on the defects of the model and different fields and problems. ELM algorithms are proposed, which can. ty. be summarized as the following:. rs i. For N arbitrary different samples (x i, t i ), where x i = [x i1, x i2, · · · , x in ]T ∈ Rn , t i =. ve. [t i1, t i2, · · · , t im ]T ∈ Rm , Then the output of a FNN with L hidden nodes and active function. ni. g(x) can be expressed as:. U. L X i=1. βi gi (x j ) =. L X. βi g(wi · x j + bi ) = o j ,. j = 1, · · · , N,. (2.1). i=1. In Equation (2.1), wi = [wi1, wi2, · · · , win ]T is the weight vector connecting the ith. hidden neuron and the input neurons, wi · x j denotes the inner product of wi and x j . Where βi = [ βi1, βi2, · · · , βim ]T is the weight vector connecting the ith hidden node and the output node, bi is threshold bias of the ith hidden node and o j is the output vector of the input sample x j . Activation function g(x) can choose the "Sigmoid function", as well as the "radial basis", "sine", "cosine", "exponential", and many other nonregular 13.

(30) a al ay M. of. Figure 2.2: ELM Architecture. L Random Hidden Neurons (which need not be algebraic sum based) or other ELM feature mappings. Different type of output functions could be used in different neurons, d are Input Nodes. ty. functions(Huang & Babri, 1998) .. rs i. If this L hidden nodes with g(x) activation function SLFN can approximate those N. ve. samples with no error, will have:. ni. L X. oj − tj = 0. (2.2). j=1. U. For input sample x j , t j is the sample class label vector. consequently, there exist βi , wi. and bi such that:. L X. βi g(wi · x j + bi ) =t j ,. j = 1, · · · , N,. (2.3). j=1. This can be written as:. 14.

(31) *.tT +/ *. βT +/ *. g(w1 · x1 + b1 ) · · · g(w L · x1 + bL ) +/ .. 1 // .. 1 // .. // .. .. // // .. .. .. ... = ... ... /// . . .. . // .. // .. // .. // .. // . T/ t βTL g(w1 · x N + b1 ) · · · g(w L · x N + bL ) , - N×L , - L×m , L - N×m. (2.4). The above equations (2.4) can be written compactly as. H β = T,. a. (2.5). M. al ay.    h(x1 )  *. g(w1 · x1 + b1 ) · · · g(w L · x1 + bL ) +/  //  ..   . // is hidden layer output . . . . .. .. .. Since H =  ..  = .. //   .. //   . h(x ) g(w · x + b ) · · · g(w · x + b ) 1 N 1 L N L  N  , matrix, the input weight vectors wi and the hidden biases bi are randomly chosen, To train. of. an SLFN is simply equivalent to finding a least-squares solution β̂ of the linear system. ty. H β = T:. rs i. k H β̂ − T k. = min k H β − T k . β. (2.6). β̂ = H+ T,. (2.7). U. ni. ve. The smallest equivalent to finding norm least-square solution is:. The H+ is inverse of the matrix H generalized by using Moore-Penrose, Matrix H is. hidden layer output matrix.. 2.2.2. Kernel Extreme Learning Machine Algorithm. ELM not only has computational efficiency, but also tends to achieve similar or better generalization performance than SVM (Huang, Ding, & Zhou, 2010). However, because of the randomly assigned input weight and bias, ELM can produce a large variation in. 15.

(32) classification accuracy with the same number of hidden nodes. On 2010 and 2012, Huang et al. (2012) combined with the learning principle of support vector machine, and proposed KELM. the Kernel Extreme Learning Machine (KELM), which replaces the ELM hidden layer with kernel function, is proposed to solve this problem. It is worth noting that the kernel function used in KELM does not need to satisfy Mercer’s theorem and KELM provides a unified solution to multiclass classification problems. KELM improved the. a. generalization ability of ELM, by replacing the random map with a kernel function.. al ay. Talk about stability, the hidden layer output matrix of KELM calculated by training samples through a kernel mapping is not associated with L, so the recognition rate of. model parameter and hidden neurons.. M. KELM is unchanged. We see that the proposed method is more robust to the variation of. rs i. ty. projection method:. of. The methods to calculate Moore-Penrose generalized inverse of a matrix is the orthogonal. H† = HT (HHT ). −1. (2.8). ve. From (2.8), Huang et al. (2012) suggested adding a positive value I/C (regularization. ni. coefficient) to help calculate the output weights. According to the ridge regression theory,. U. where C is used to fix the over-fitting problem:. β = HT (. −1 I + HHT ) T C. (2.9). Since the output function for the ELM is:. f (xi ) = h(xi ) β. (2.10). h(x i ) is the output of the hidden nodes, H is hidden layer output matrix. Put Equation 16.

(33) (2.9) and Equation (2.10), the output function defined as follows:.    h(xi )h(x1 )T    −1  ( I + HHT ) T  . .. f (xi ) =   C    T  h(xi )h(x N ) . (2.11). a. Define a kernel function k as:. (2.12). al ay. K(xi, x1 ) = h(xi )h(x1 )T = HHT Thus, the output function of KELM can be written as:. M. T. (2.13). Pre-processing. rs i. 2.3. ty. of. *. K(xi, x1 ) +/ // .. −1 −1 // ( I + K) T . . T I T .. f (xi ) = h(xi )H ( + HH ) T = .. // C C .. // . K(xi, x N ) ,. ve. Data pre-processing is one of the most important steps in the data mining process. In activity recognition system, Pre-processing usually can be separate to feature selection and. ni. data sample selection, to improve efficiency and accuracy to further achieve the ultimate. U. goal of improving overall performance.. 2.3.1. Feature Selection. Feature selection consists of selecting a subset of relevant features from the original feature set (Guyon & Elisseeff, 2003). To differentiate between samples, classification algorithms need representative features. Using inappropriate or redundant features may deteriorate the performance of a classification algorithm.. 17.

(34) Activity recognition problem is type of classification problem, both using classifier separate different human states. Emotion recognition is very similar to activity recognition, As we know, the ReliefF algorithm is a widely used feature selection method. J. Zhang et al. (2016) propose a ReliefF-based selection methods with SVM classifier, The results show the effectiveness of ReliefF as a feature selection algorithm. Similar results have also been reported in other literatures (Schmidt & Trainor, 2001; W.-L. Zheng & Lu, 2015),. a. which prove the significance of ReliefF weight to some extent.. al ay. Kira and Rendell (Gupta, 2014) came up with a Relief algorithm in 1992 for a general problem with a high number of features. In the same year, (Gupta & Dallas, 2014) used. M. ReliefF and sequential forward floating search (SFFS) to solve an activity recognition problem, combine with Naive Bayes and k-nearest neighbor (k-NN), results shown ReliefF. of. can select relevant and robust features. Kononenko (Kononenko, 1994) improved the basic Relief algorithm, into ReliefF, by improving noise immunity and introducing support for. ty. multi-class problems. The Relief and ReliefF algorithms use a statistical approach rather. rs i. than heuristic search for finding the feature subset. ReliefF provides a relevance weight to. ve. each of the potential feature and the ones above a set relevance threshold are selected. Capela, Lemaire, and Baddour (2015) selected features using three filter-based, classifier-. ni. independent, feature selection methods (ReliefF, Correlation-based Feature Selection(CFS),. U. Fast Correlation Based Filter(FCBF)). ReliefF has been reported to be useful in cases with. strong interdependencies between fields (Liu & Motoda, 2007), the accuracy achieved using the ten highest ranked features was within 5% of the maximum accuracy achieved.. Thus, subsets of the top 10 ranked features were used to compare populations. the results have shown ReliefF is a good feature selection method.. 18.

(35) 2.3.2. Clustering Method. Clustering or cluster analysis is the task of assigning a set of objects into clusters, so that the objects in the same cluster are more similar to each other than to those in other clusters. In data mining, clustering is always attracting the attention of researchers, many cluster models include hierarchical clustering (Sibson, 1973; Defays, 1977), density based clustering (Ester, Kriegel, Sander, Xu, et al., 1996; Roy & Bhattacharyya, 2005),. a. centroid-based clustering(Lloyd, 1982), and clustering models related to statistics (Moon,. al ay. 1996). In order to take advantage of feature mapping technology, kernel method has been used in clustering (L. Zhang, Zhou, & Jiao, 2002; Girolami, 2002; Camastra & Verri, 2005),. M. and better results have been obtained. As an explicit feature mapping technique, ELM feature mapping (Huang & Chen, 2007, 2008) is more convenient than the kernel-based. of. method. Therefore, it is meaningful to use clustering technology to replace random select for RKELM. compare to select center point randomly in RKELM, by using a clustering. ty. method to select center point will improve the performance of the classifier. We believe. rs i. that replacing the RKELM kernel with the K-Means calculated center point kernel can. Optimization Method. ni. 2.4. ve. obtain more satisfactory classification and regression results.. U. Different studies use evolutionary algorithms to search the parameters of the classifier,. which are used to build classification models with high prediction accuracy and stability. For example, in SVM, many optimization algorithms are used to optimize SVM parameters, such as penalty parameters and kernel parameters that control the complexity and accuracy of the prediction model(Subasi, 2013; Tharwat & Hassanien, 2018).. 19.

(36) 2.4.1. Particle Swarm Optimization Algorithm. The particle swarm optimization algorithm is a new evolutionary algorithm. It starts from the random solution and finds the individual optimal and global optimal solutions through continuous iteration without passing through the “crossover” and “mutation” processes, simplifies the operation of the algorithm. Suppose there is a group of birds in the space foraging. Each bird in the flock is called a “particle”, and the “food” that the final. a. flock is looking for is equivalent to the optimal solution of the problem. In the optimization. al ay. process, all particles have a fitness value, which is determined by the optimized function. At the same time, each particle has two properties: position and velocity. The velocity of a. M. particle determines the direction of flight of the particle, and its velocity is determined by the experience of the particle itself and the location of the surrounding good particles.. of. Tharwat, Mahdi, Elhoseny, and Hassanien (2018) introduced a model uses k-Nearest Neighbor (k-NN) classifier, collect data from sensor units on the chest, legs and arms.. ty. k-NN has only one parameter, k, to determine the number of selected nearest neighbors of. rs i. the test or unknown sample to predict the category label of the unknown sample. Especially. ve. for high-dimensional data, it is difficult to search the k value which has a great impact on the classification performance. Particle Swarm Optimization (PSO) algorithm is used to. ni. search the optimal value of k parameter in k-NN classifier. Three experiments are carried. U. out to prove how PSO in this algorithm searches for the optimal value of k parameters in order to reduce the false classification rate of k-NN classifier. results of PSO-kNN algorithm are compared with two well-known algorithms: Genetic Algorithm (GA) and. Ant Colony Optimization (ABCO) ant. The results also show that the error rate is lower than that of GA and ABCO algorithms. The analysis of the trajectory of a single particle reveals the mechanism of PSO (Clerc & Kennedy, 2002; Ozcan & Mohan, 1999). As far as classical mechanics is concerned,. 20.

(37) a particle is described by its position vector ~x and velocity vector ~v , which determines the trajectory of the particle. Particles move along a definite trajectory in Newtonian mechanics, but not in quantum mechanics. In the quantum world, the term trajectory is meaningless, because according to the principle of uncertainty, the ~x and ~v of particles cannot be determined at the same time. Therefore, if a single particle in the PSO system has quantum behavior, the PSO algorithm must work in a different way.. a. In J. Sun, Feng, and Xu (2004), The quantum theory is introduced into PSO and the. al ay. quantum behavior PSO (QPSO) algorithm is proposed. the experimental results show that QPSO is better than the standard PSO in several benchmark dataset, and it is a promising. M. algorithm. It happens that there is a similar case. (Peng et al., 2016) did similar study, proposal a novel multi-class classification method termed quantum-behaved particle swarm. of. optimization-based kernel extreme learning machine (QPSO-KELM), solve also time and frequency domain problems like human activity. KELM is compared with five existing. ty. classification methods: Linear discriminant analysis (LDA), quadratic discriminant analysis. rs i. (QDA), extreme learning machine (ELM), k-nearest neighbor (KNN) and support vector. ve. machine (SVM). Meanwhile, three traditional optimization methods including particle swarm optimization algorithm (PSO), genetic algorithm (GA) and grid search algorithm. ni. (GS). Where QPSO-KELM obtains 95%, PSO-KELM, GA-KELM and GS-KELM can. U. only achieve 88.75%, 87.5% and 86.25% classification rates, respectively.Therefore, In this Thesis we will implement QPSO for parameter selection and optimization.. 2.5. Research Gaps. As can be seen from the previous literature, HAR has a large number of classification methods after a long period of research. For example, the use of large-scale deep learning, and the use of new methods such as SVM, ELM, KELM and other methods in recent years, the use of offline and online learning methods, the use of feature selection or other 21.

(38) various optimization methods. There are more literature review are further summarized in Table 2.5. Through the study of the literature, the lack of reasonable feature selection, in many algorithms leads to a lot of unrelated feature substitution operations, which is time-consuming and computing-consuming; in recent years, ELM methods are faster and more efficient than deep learning or SVM algorithms, so that KELM or Reduced KELM algorithms are also be introduced. But they either have a lot of computation, or they use a. a. simple random similar to ELM, and we decided to combine the Clustering algorithm to. al ay. solve the above problems; finally, a good optimization affects the entire algorithm. the previous algorithm still has problems in selecting parameter. We will also propose our. M. approach on the selected optimization parameter selection. We will present our model by analyzing the following aspects: Choosing the appropriate Pre-processing aspect to. of. reduce the noise and speed up the operation, the main classifier will optimize the ordinary. 2.6. rs i. selection is used.. ty. ELM and KELM to propose our OPT method, and what kind of The method of parameter. Summary. ve. In this chapter, we first review some basic concepts that are fundamental for the rest of. ni. the thesis.machine learning methods and algorithms are reviewed. Then a comprehensive. U. literature review is given, from pre-processing methods, classification methods, and optimization methods. especially reviewed various ELM methods. The background and literature review provide a good foundation for us to develop our own HAR applications based on both supervised and unsupervised learning.. 22.

(39) Author/Year Methods (Y. Chen et al., PCA-ELM 2012). Dataset Accuracy Self Collec79.68% tion. (Seera, Loo, & Lim, 2014). HARUS. 96.52%. (Al Jeroudi, Ali, Latief, & OS-ELM Akmeliawati, 2015). 82.05%. 95.6%. 91.89%. ty. of. Least Squares Support Vector (Y. Zheng, Machine (LS- USC-HAD 2015) SVM) and Naive Bayes (NB). rs i. M. HARUS. al ay. a. FMM-CART. U. ni. ve. (HanYing et al., OS-KELM HARUS 2018) weighted obser(C. Wang, Xu, vation hidden Liang, Huang, Markov model HARUS & Zhang, (WOHMM) + 2018) SVM (Saha et al., SVM HARUS 2018) (Saha et al., SVM 2018). Findings First paper implement using ELM The extracted rules offer an insight into human activity tasks First ELM based online method, It is proved that ELM algorithm plays an important role in human body recognition. compared ReliefF, Correlationbased Feature Selection (CFS), and Fast Correlation Based Filter (FCBF) state-of-art result for online learning. HAPT. 92.5%. 96%. 94.2%. Proves that a good feature selection method is necessary. SVM method best result SVM method best result, bring fall detection idea to future study. 23.

(40) CHAPTER 3: METHODOLOGY. 3.1. Introduction. In the previous two chapters, we described the state of art for solving HAR problems, and also presented some limitations. In this chapter, we will describe the methods and processes of solving the problems highlighted in the previous chapter. Human activity recognition system using Opt-RKELM algorithm is proposed in detail. The algorithm. al ay. a. is comprised of three steps: We will describe the Data Preprocessing, Classification Algorithms, and Optimization parameter selection. A detailed description of these steps. M. follows:. • Step (1): In the data pre-processing stage, the readings of three axes are combined. of. into magnitude sequence to eliminate the directional dependence. Statistic and frequency C domain features are extracted from the magnitude series of resultant. rs i. data dimension.. ty. acceleration. Finally using ReliefF to reduce feature dimension, K-Means to reduce. ve. • Step (2): The classification model construction stage. For the classification model construction, with the characters of fast learning speed, stability, and good. ni. generalization capability, Optimized Reduced Kernel Extreme Learning Machine is. U. used to build the classification model.. • Step (3): Optimization stage, Quantum-behaved Particle Swarm Optimization QPSO is used to optimize the parameters of KELM kernel.. 24.

(41) a al ay M of ty rs i ve. Figure 3.1: Overview flow of Opt-RKELM. System Overview. ni. 3.2. U. As illustrated in Figure 3.1, Human Activity benchmark Datasets send to feature. reduction step by using ReliefF (ReliefF) to ranking the importance of features, This is. to improve the speed, accuracy, and also data noise. After select features, give K-Means (K-Means) do clustering for data, find 10% data prepare for Kernel Matrix calculation, This is based on the article, W.-Y. Deng et al. (2014) H n is always smaller than n, too large, will make the calculation slower, we choose close to 10% H n in order to mention the speed of operation (W.-Y. Deng et al., 2014), also because normal RKELM choose 10%,. 25.

(42) As an extreme case when n = 100%, it is the same as the KELM algorithm. After the pre-processing stage, our Optimized Reduced Kernel Extreme Learning Machine will be used as the main classifier, by optimization parameter selection, we use QPSO find best suitable parameter.. 3.3. Preprocessing Methods. There are several pre-processing steps needed before we can use the sensor signal data. al ay. a. as input for our classification algorithms. We first reduce the number of noise features in the data, then we use K-means find key point help calculate Kernel matrix for later. 3.3.1. M. classifier use.. Feature Selection - ReliefF Algorithm. of. To refine our feature set, we applied the ReliefF filter method (Kononenko, 1994). The ReliefF filter method is an improvement from the Relief method, ReliefF algorithm. ty. evaluates the ability of each feature to distinguish instances from different classes. ReliefF. rs i. not only considers the correlation between feature values and target classes but also the. ve. distances between the instances, so the key features defined by ReliefF can collect similar instances and separate different instances, thus effectively promoting accurate classification.. ni. In addition, because the feature selection algorithm can reduce calculations, it can save. U. you plenty of time. The ReliefF algorithm measures the features data to determine a ranking of feature. importance. Figure 3.2 and Figure 3.3 shows the results of 561 features from Human Activity Recognition Using Smartphones (HARUS) and Smartphone-Based Recognition of Human Activities and Postural Transitions (HAPT) activity benchmark datasets, ordered by the predicted parameter importance. The higher the bar, the more important this feature is. We select all positive features in this experiments.. 26.

(43) a al ay M of. ty. Figure 3.2: Human Activity Recognition Using Smartphones Dataset from the ReliefF algorithm sorted by relevance. The higher features are more important.. rs i. This is because, in human activities, there are a lot of noises been collected that affect the. ve. classification of the algorithm. These features not only increase the amount of calculation but also affect the performance. Therefore, the feature can be filtered by ReliefF. As shown. ni. in the two figures: Figure 3.2 and Figure 3.3, we take the features separately. 179 and 427. U. feature out of 561 are finally selected. A fundamental idea of the ReliefF algorithm, presented in Figure 3.4 is to estimate. the quality of attributes according to how well their values distinguish between instances that are near to each other. For that purpose, ReliefF randomly selects a feature Ri , then searches for k of its nearest neighbors from the same class, named nearest hits H j , after that also searches k nearest neighbors from each of the different classes, named nearest misses M j (C). It updates the quality estimation W [A] for all attributes A depending on their. 27.

(44) a al ay M of. ty. Figure 3.3: Human Activities and Postural Transitions Dataset from ReliefF algorithm sorted by relevance. The higher features are more important.. rs i. values for Ri , hits H j and misses M j (C). The update formula is similar to that of Relief,. ve. except in reliefF we need average the contribution of all the hits and all the misses. The contribution for each class of the misses is weighted with the prior probability of that class. ni. P(C). Contributions of hits and misses in each step will be in [0, 1] and also symmetric. U. (we explain reasons for that below) we have to ensure that misses’ probability weights sum to 1. As the class of hits is missing in the sum we have to divide each probability weight. with factor 1 − P(class(Ri )) (which represents the sum of probabilities for the misses’ classes). The process is repeated for m times. Selection of k hits and misses is the basic difference to Relief and ensures greater robustness of the algorithm concerning noise. User-defined parameter k controls the locality of the estimates. For most purposes. ReliefF originally used constant influence of. 28.

(45) M. al ay. a. k nearest neighbors with k set to small number like 10 (Kononenko, 1994).. of. Figure 3.4: Pseudo code of ReliefF algorithm.. Clustering Method - K-Means Algorithm. ty. 3.3.2. In human activity recognition, there are usually a large amount of experimental data.. rs i. These data structures have high complexity and high complexity, and contain a lot of. ve. redundant information. The interference caused by the classifier detection process is relatively large, which increases the time complexity of the intrusion detection system. In. ni. our proposed method, K-means is a classical clustering method (Fu & Sun, 2011), which. U. solves the optimal clustering center and optimal classification by learning. It has high learning efficiency and can process large-scale data (JIA, DING, & SHI, 2015). In this thesis, in order to stably reduce the computational complexity of the Kernel algorithm in RKELM, the K-means algorithm is used to cluster the data to achieve stable and high accuracy. The K-Means algorithm has many advantages such as small computational complexity, high efficiency for large datasets, and high linearity of time complexity (Zhao, Li, & Cao,. 29.

(46) 2018). According to W.-Y. Deng et al. (2014), "Of course, just like the work in Wang, Kong, and Zheng (2010) where clustering technique is applied to optimize the algorithm, if optimization technology is used for samples selection in RKELM, the performance of RKELM will be improved further." K-means is an unsupervised learning clustering technology, we used in replace original RKELM’s random select for calculate kernel matrix. The K-Means clustering algorithm is. a. one of the most popular clustering algorithms at present. It uses the Euclidean distance. al ay. metric as the standard similarity analysis method, and divides the whole into k classes with high similarity by the number of randomly generated clusters. Although the random. M. generation of K values is still random, but it is still better than the simple random calculation of the kernel matrix, and has sufficient generalization ability. The experimental results also. of. show that RKELM using K-means is significantly better than ordinary KELM or RKELM. As shown in Figure 3.5. K-means is based on some optimization measures, partitions. ty. the data set into a given number of clusters.. rs i. The clustering problem that the K-means algorithm aims to solve can be expressed as. ve. follows: Given the representation of N patterns, find K clusters based on the similarity measure, so that the patterns within the cluster are more similar to each other (higher than. ni. intra-cluster similarity) Instead of patterns belonging to different clusters (low inter-cluster. U. similarity).. Let X = {xi, i = 1, . . . , N } be the set of N patterns to be clustered into a set of K clusters,. C = {ck }{k = 1, . . . , K }. Typically each pattern is a vector of dimension d(xi ∈ Rd ) and K N. The K-means algorithm finds a partition that minimizes the square of the Euclidean distance between the cluster center and the modes in the cluster. Let µ k be the average of the cluster ck , defined as:. 30.

(47) a al ay M of ty rs i. 1 X µk = xi Nk x ∈c i. (3.1). k. U. ni. ve. Figure 3.5: Flowchart of K-Means. Nk is the number of patterns in cluster ck. J (ck ) =. X. k xi − µ k k 2. (3.2). xi ∈ck. The main goal of the K-means algorithm is to minimize the sum of the squared errors on all K clusters.. 31.

(48) J (C) =. K X X. k xi − µ k k 2. (3.3). k=1 xi ∈ck. The human activity benchmark dataset N sample data, and given the number K of clusters, based on reference paper from W.-Y. Deng et al. (2014), we select K = 10% whole dataset. Firstly, randomly select K samples as the cluster center of the initial partition, and then use iterative according to the similarity measure function. Method, calculating the. al ay. a. distance of the undivided sample data to each cluster center point, and dividing the sample data into the cluster class in which the cluster center is closest, and calculating each cluster class by the calculation The average of all the data in the cluster class continuously moves. is minimal and there is no change.. M. the cluster center, and the cluster is re-divided until the sum of squared errors in the class. of. The main steps of the K-means algorithm are as follows:. ni. ve. rs i. ty. Algorithm 1 K-Means algorithm 1: Base on feature selection dataset from ReliefF 2: Initialize K cluster centers randomly. (K = 10% whole dataset) 3: Allocate each pattern to its closest cluster. 4: Compute new cluster centers using Equation (3.1). 5: Repeat steps 2 and 3 until there is no change for each cluster. 6: Out put the 10% as center point pass to classifier.. Classification Method. U. 3.4. The Kernel Extreme Learning Machine developed by Huang et al. (2010). has been. widely adopted in the machine learning community as a classification technique with high prediction performance. its good generalization even better than SVM (Support Vector Machine) (Huang et al., 2012). But it does not suitable apply to human activity recognition applications in mobile devices because the kernel matrix K (X, X ) need to be built based on the entire data set, Which is very expensive n terms of storage and computation.. 32.

(49) 3.4.1. Extreme Learning Machine with Reduced Kernel Algorithm. In more detail, although KELM has good generalization and stability, because KELM introduces Kernel calculation, when the amount of data is relatively large, the entire data and Kernel calculation operation will take up a lot of computing resources, and the training time is greatly increased, which reduces KELM efficiency. In order to reduce the computational resources required for Kernel computation and shorten the training. a. time of KELM, W. Deng, Zheng, and Zhang (2013) proposed the RKELM algorithm.. al ay. The RKELM algorithm uses the idea of random sampling to randomly select a subset of samples from the entire data sample, replacing the entire data set for the Kernel calculation.. M. However, the random use leads to the instability of RKELM. The random selection subset also affects the accuracy. According to wei Wang, Kong, and ying Zheng (2010) research,. of. Clustering technology is applied to optimize the algorithm, we use K-Means to perform pre-processing to select the same size subset solved the above problem and maintains a. rs i. ty. very short training time on the premise of maintaining accuracy, name it Opt-RKELM. ( )N Given ℵ arbitrary distinct sample (xi, ti ) xi ∈ Rd, ti ∈ Rm , based on K-Means i=1. ve. H = {x i }Hn was selected from original X = {x i } n data set, and X H X, and use results, X i=1 i=1 H X) H in place of K (X, X) to reduced problem data size of Huang’s KELM (Huang et H(X, K. ni. al., 2012).. U. We can define kernel matrix as follow:. H ELM Ω.    K (x 1, x 1 ) · · · K (x 1, xHn )      . . . H .. .. .. = h(xi ) · h(H x j ) = K (xi,H x j ) =       K xHn, x 1 · · · K xHn, xHn  H n×H n. (3.4). and the output weights without regulator can be:. 33.

(50) HT K H −1 K HT T β= K. (3.5). According to the ridge regression theory (Hoerl & Kennard, 1970), adding a positive H X) H TK H X) H , can have H(X, H(X, value I/C (C is regularization coefficient) to diagonal of K better generalization performance(Huang et al., 2012; W. Deng, Zheng, & Chen, 2009). so. ! −1. al ay. I HT K H β= +K C. a. the output weight β finally as follows:. HT T K. (3.6). Where C is the regularization parameter, handle over-fitting problem, T is output matrix,. M. H:H the size of K n is always smaller then n, if H n = n, Opt-RKELM will become KELM, in. of. n nearly 10% of n for compare with Deng’s model (W.-Y. Deng et this thesis, we will set H al., 2014).. ty. There are some common kernel function, linear kernel function, polynomial kernel. ve. rs i. function, Gaussian kernel function, wavelet kernel function: (radial basis function). (3.7). Linear Kernel: K (x i, x j ) = xTi x j. (3.8). Polynomial Kernel: K (x i, x j ) = (xTi x j + r) d. (3.9). U. ni. k xi − x j k 2 RBF Kernel: K (x i, x j ) = exp(− ) 2σ 2. Sigmoid Kernel: K x i, x j = tanh ρhx i, x j i + r Wavelet Kernel: K (x i, x j ) = cos(w. k x i − x j k (−(k x i −x j k 2 / f )) )e b. (3.10) (3.11). d, f , b, w and σ are real constant parameters, It has been found that RBF Kernel(Gaussian kernel) is giving the best result.. 34.

(51) Thus, the output function of Opt-RKELM can be written as: T. *. K(xi, x1 ) +/ .. // −1 . // ( I + K . H X) H TK H X)) H H X) H TT H (X, H (X, H(X, .. f (x) = .. K / C .. // . / K(H xi,H xN ) , -. (3.12). Kernel parameter of kernel function, together with regularization coefficient C in. 3.5. al ay. a. Equation (3.6) will be optimized by QPSO.. Optimization - Quantum-behaved Particle Swarm Optimization Algorithm. It is well known that the performances can be affect by parameters in algorithms.. M. Therefore, Optimization need use to optimize the value of parameters of classifier. Opt-RKELM is built by ELM combined with Kernel function instead of the hidden. of. node. Then, due to the existence of kernel function, Opt-RKELM is very sensitive to. ty. Kernel parameters setting. Therefore, it is very important to use QPSO to optimize the. rs i. parameters of Opt-RKELM. In this thesis, it is used QPSO combination method is used to improve the accuracy of classification. The results show that the method has excellent. ve. classification performance in motion recognition. The flowchart of this QPSO is illustrated. ni. in Figure 3.6. U. The quantum particle swarm optimization algorithm is developed on the basis of particle. swarm optimization. QPSO uses the basic idea of quantum mechanics to optimize the particle swarm, which overcomes the shortcomings of PSO convergence rate and easy to fall into local optimum. The search can be performed in the entire solution space, and the evolution equation has few parameters, simple form, strong global search ability and robustness. In QPSO, the position equation of a particle is:. X (t + 1) = P ± β · |Pmbest − X (t)| · ln(1/u). (3.13) 35.

(52) a al ay M of rs i. ty. Figure 3.6: Flowchart of QPSO.. Where t represents the number of iterations, X (t) represents t is the current position. ve. of the particle group, P is the best individual location, β is the contraction expansion. ni. coefficient (the convergence speed of the control algorithm), Pmbest is the optimal position. U. for particle swarm evaluation: U ∈ (0, 1). Suppose M is the population size, N is the dimension of the particle, and P = [pi1, pi2, · · · , pin ] is the optimal position of the ith particle. then:. Pmbest. M M M M 1 X 1 X 1 X + 1 X * = Pi = Pi1 , Pi2 , . . . Pin M i=1 M M M , i=1 i=1 i=1. (3.14). And each particle converges to its own random point P:. 36.