## FEATURE SELECTION AND ENHANCED KRILL HERD ALGORITHM FOR TEXT DOCUMENT

## CLUSTERING

## LAITH MOHAMMAD QASIM ABUALIGAH

## UNIVERSITI SAINS MALAYSIA

## 2018

## FEATURE SELECTION AND ENHANCED KRILL HERD ALGORITHM FOR TEXT DOCUMENT

## CLUSTERING

by

## LAITH MOHAMMAD QASIM ABUALIGAH

Thesis submitted in fulfilment of the requirements for the degree of

Doctor of Philosphy

## March 2018

### ACKNOWLEDGEMENT

Beginning this Ph.D. thesis has been a life-changing experience for me and I would not have achieved it without the guidance and support of many people. I must declare many external donations from individuals who extended a helping hand throughout this study.

I am thankful to Allah SWT for giving me strength to finish this study. I am also grateful to my supervisor, Professor Dr. Ahamad Tajudin Khader from the School of Computer Sciences at Universiti Sains Malaysia, for his wise counsel, helpful advice, connected support and supervision throughout the duration of this study. I am also thankful to my co-supervisor, Dr. Mohammed Azmi Al-Betar from the Department of Information Technology, Al-Huson University College, Al-Balqa Applied University, for his assistance. I am also thankful to Dr. Essam Said Hanandeh from the Department of Computer Information System, Zarqa University, for his assistance.

My family deserves special thanks. Words cannot express how grateful I am to my father, mother, and brothers for all of the sacrifices that they have done for me. Finally, I thank all of my friends who encouraged me throughout the duration of this study.

### TABLE OF CONTENTS

Acknowledgement . . . ii

Table of Contents . . . iii

List of Tables . . . x

List of Figures . . . xv

List of Abbreviations . . . xviii

Abstrak . . . xx

Abstract . . . xxi

CHAPTER 1 – INTRODUCTION 1.1 Background . . . 1

1.2 Motivation and Problem Statement . . . 3

1.3 Research Objectives . . . 6

1.4 Contributions . . . 7

1.5 Research Scope . . . 9

1.6 Research Methodology . . . 9

1.7 Thesis Structure. . . 11

CHAPTER 2 – KRILL HERD ALGORITHM 2.1 Introduction . . . 13

2.2 Krill Herd Algorithm . . . 13

2.3 Why the KHA has been Chosen for Solving the TDCP . . . 14

2.4 Krill Herd Algorithm: Procedures . . . 15

2.4.1 Mathematical Concept of Krill Herd Algorithm . . . 15

2.4.1(a) Movement Induced by other Krill Individuals . . . 16

2.4.1(b) Foraging Motion: . . . 20

2.4.1(c) Physical Diffusion: . . . 21

2.4.1(d) Updating the Krill Individuals: . . . 22

2.4.2 The Genetic Operators . . . 23

2.4.2(a) Crossover Operator of KH Algorithm: . . . 23

2.4.2(b) Mutation Operator of KH Algorithm: . . . 24

2.5 Conclusion . . . 25

CHAPTER 3 – LITERATURE REVIEW 3.1 Introduction . . . 26

3.2 Background . . . 26

3.3 Text Document Clustering Applications . . . 27

3.4 Variants of the Weighting Schemes . . . 28

3.5 Similarity Measures . . . 32

3.5.1 Cosine Similarity Measure . . . 33

3.5.2 Euclidean Distance Measure . . . 35

3.6 Text Feature Selection Method . . . 36

3.7 Metaheuristics Algorithm for Text Feature Selection. . . 38

3.7.1 Genetic Algorithm for the Feature Selection . . . 38

3.7.2 Harmony Search for the Feature Selection . . . 39

3.7.3 Particle Swarm Optimization for the Feature Selection . . . 40

3.8 Dimension Reduction Method . . . 41

3.9 Partitional Text Document Clustering . . . 45

3.9.1 K-mean Text Clustering Algorithm . . . 45

3.9.2 K-medoid Text Clustering Algorithm . . . 47

3.10 Meta-heuristics Algorithms for Text Document Clustering Technique . . . 48

3.10.1 Genetic Algorithm . . . 48

3.10.2 Harmony Search Algorithm . . . 49

3.10.3 Particle Swarm Optimization Algorithm . . . 50

3.10.4 Cuckoo Search Algorithm . . . 52

3.10.5 Ant Colony Optimization Algorithm . . . 54

3.10.6 Artificial Bee Colony Optimization Algorithm . . . 55

3.10.7 Firefly Algorithm. . . 56

3.11 Hybrid Techniques for Text Document Clustering. . . 58

3.12 The Krill Herd Algorithm . . . 61

3.12.1 Modifications of Krill Herd Algorithm . . . 63

3.12.1(a) Binary-Based Krill Herd Algorithm. . . 63

3.12.1(b) Chaotic-Based Krill Herd Algorithm. . . 63

3.12.1(c) Fuzzy-Based Krill Herd Algorithm . . . 64

3.12.1(d) Discrete-Based Krill Herd Algorithm . . . 64

3.12.1(e) Opposition-Based Krill Herd Algorithm . . . 65

3.12.1(f) Other Modifications . . . 65

3.12.2 Hybridizations of Krill Herd Algorithm . . . 66

3.12.2(a) Hybridization with Local Search-Based Algorithm . . . 66

3.12.2(b) Hybridization with Population-Based Algorithm. . . 67

3.12.3 Multi-objective Krill Herd Algorithm . . . 68

3.13 Critical Analysis . . . 68

3.14 Summary . . . 72

CHAPTER 4 – PROPOSED METHODOLOGY 4.1 Introduction . . . 76

4.2 Research Methodology Outline . . . 76

4.3 Text Pre-processing. . . 77

4.3.1 Tokenization . . . 77

4.3.2 Stop Words Removal . . . 79

4.3.3 Stemming . . . 79

4.3.4 Text Document Representation . . . 79

4.4 Term Weighting Scheme . . . 80

4.4.1 The Proposed Weighting Scheme . . . 81

4.4.2 Illustrative example . . . 84

4.5 Text Feature Selection Problem . . . 85

4.5.1 Text Feature Selection Descriptions and Formulations . . . 86

4.5.2 Representation of Feature Selection Solution . . . 87

4.5.3 Fitness Function . . . 88

4.5.4 Metaheuristic Algorithms for Text Feature Selection Problem . . . 88

4.5.4(a) Genetic Algorithm: Procedure . . . 89

4.5.4(b) Harmony Search Algorithm: Procedure . . . 91

4.5.4(c) Particle Swarm Optimization: Procedure . . . 94

4.6 Proposed Detailed Dimension Reduction Technique . . . 98

4.7 Text Document Clustering . . . 105

4.7.1 Text Document Clustering Problem Descriptions and Formulations 106 4.7.2 Solution Representation of Text Document Clustering Problem . . . 106

4.7.3 Fitness Function . . . 107

4.8 Developing Krill Herd-based Algorithms . . . 108

4.8.1 Basic Krill Herd Algorithm for Text Document Clustering Problem 108 4.8.1(a) Mapping Krill Herd Algorithm Terms with Optimization and Text Document Clustering Concepts . . 109

4.8.1(b) Overview of Krill Herd Algorithm . . . 109

4.8.2 Summary of Basic Krill Herd Algorithm . . . 114

4.8.3 Modified Krill Herd Algorithm for Text Document Clustering Problem . . . 115

4.8.3(a) Introduction. . . 115

4.8.3(b) Overview of Modified Krill Herd Algorithm . . . 115

4.8.3(c) Summary of Modified Krill Herd Algorithm . . . 117

4.8.4 Hybrid Krill Herd Algorithm for Text Document Clustering Problem . . . 118

4.8.4(a) Introduction. . . 118

4.8.4(b) Overview of Hybrid Krill Herd Algorithm . . . 118

4.8.4(c) Summary of Hybrid Krill Herd Algorithm. . . 121

4.8.5 Multi-objective Hybrid Krill Herd Algorithm for Text Document Clustering Problem. . . 122

4.8.5(a) Introduction. . . 122

4.8.5(b) Overview of Multi-objective Hybrid Krill Herd Algorithm . . . 123

4.8.5(c) Summary of Multi-objective Hybrid Krill Herd Algorithm . . . 126

4.9 Experiments and Results . . . 126

4.9.1 Comparative Evaluation and Analysis . . . 126

4.9.1(a) Parameter Settings for Text Feature Selection Algorithms 127 4.9.1(b) Parameter Settings for Text Document Clustering Algorithms. . . 128

4.9.2 Measures for Evaluating the Quality of Final Solution (Clusters) . . 129

4.9.2(a) Accuracy Measure . . . 129

4.9.2(b) Purity Measure . . . 130

4.9.2(c) Entropy Measure . . . 131

4.9.2(d) Precision Measure . . . 131

4.9.2(e) Recall Measure . . . 132

4.9.2(f) F-measure. . . 132

4.10 Conclusion . . . 133

CHAPTER 5 – EXPERIMENTAL RESULTS

5.1 Introduction . . . 134

5.1.1 Benchmark Text Datasets . . . 135

5.2 Feature Section methods for Text Document Clustering Problem . . . 136

5.2.1 Experimental Design . . . 136

5.2.2 Results and Discussions . . . 138

5.2.2(a) Evaluation of Proposed Methods based on Evaluation Measures. . . 139

5.2.2(b) Evaluation of Proposed Methods in Terms of Number of Selected Features . . . 145

5.2.2(c) Evaluation of Proposed Methods based on Threshold Values . . . 147

5.2.2(d) Evaluation of Proposed Methods based on Computational Time . . . 147

5.2.2(e) Statistical Analysis . . . 150

5.2.3 Summary of Feature Selection Methods . . . 150

5.3 Basic Krill Herd Algorithm for Text Document Clustering Problem . . . 152

5.3.1 Experimental Design . . . 152

5.3.2 Experimental Results . . . 153

5.3.3 Parameter Setting . . . 153

5.3.4 Comparative and Analysis . . . 155

5.3.5 Basic Krill Herd Algorithm Summary . . . 167

5.4 Modified KH algorithm for Text Document Clustering Problem . . . 167

5.4.1 Experiments Design . . . 167

5.4.2 Results and Discussions . . . 168

5.4.3 Modified Krill Herd Algorithm Summary . . . 174

5.5 Hybrid KH Algorithm for Text Document Clustering Problem . . . 175

5.5.1 Experimental Setup . . . 175

5.5.2 Results and Discussions . . . 175

5.5.3 Hybrid Krill Herd Algorithm Summary . . . 182

5.6 Multi-objective Hybrid KH Algorithm for Text Document Clustering Problem . . . 184

5.6.1 Experimental Setup . . . 184

5.6.2 Results and Discussions . . . 184

5.6.3 Multi-objective Hybrid Krill Herd Algorithm Summary . . . 190

5.7 Comparing Results among Proposed Methods . . . 190

5.8 Comparison with Previous Methods . . . 196

5.9 Conclusion . . . 200

CHAPTER 6 – CONCLUSION AND FUTURE WORK 6.1 Introduction . . . 202

6.2 Research Summary . . . 202

6.3 Contributions against Objectives . . . 203

6.4 Future Research. . . 203

**REFERENCES** ... ... ..

205
LISTOFPUBLICATIONS
### LIST OF TABLES

Page

Table 3.1 Overview of the feature selection algorithms 73

Table 3.2 Overview of the text clustering algorithms 74

Table 3.2 (Cont...) Overview of the text clustering algorithms 75

Table 4.1 Terms frequency. 84

Table 4.2 Terms weight using TF-IDF. 85

Table 4.3 Terms weight using LFW. 86

Table 4.4 The text feature selection problem and optimization terms in

the genetic algorithm context. 89

Table 4.5 The text feature selection problem and optimization terms in

the harmony search algorithm context. 91

Table 4.6 The text feature selection problem and optimization terms in

the particle swarm optimization algorithm context. 95 Table 4.7 The text document clustering and optimization term in the krill

herd solutions context 109

Table 4.8 The parameters values for different variant of text feature

selection algorithms. 128

Table 4.9 The best configuration parameters of the TD clustering

algorithms. 129

Table 5.1 Description of document datasets that used in this research. 137 Table 5.2 Summary of experimental methods using k-mean clustering

algorithm 138

Table 5.3 Comparing the performance of the k-mean text clustering

algorithms in terms of the purity and entropy measures. 140 Table 5.4 Comparing the performance of the k-mean text clustering

algorithms in terms of the precision and recall measures. 142 Table 5.5 Comparing the performance of the k-mean text clustering

algorithms in terms of the accuracy and F-measure measures. 143

Table 5.6 The number of the best results obtained by the proposed

methods in terms of the evaluation measures over all datasets. 145 Table 5.7 The number of the best results obtained by the feature selection

algorithms (GA, HS, and PSO) in terms of the evaluation

measures over all datasets. 145

Table 5.8 Summary of the experimental using DDR to adjust the

threshold value. 148

Table 5.9 Dimension reduction ratio when threshold = 25 using DDR. 148 Table 5.10 The average ranking of k-mean clustering algorithms based on

the F-measure. i.e., lower rank value is the best method. 151 Table 5.11 Convergence scenarios of the basic krill herd algorithm (BKHA). 155 Table 5.12 The results of BKHA convergence scenarios (Scenario.(1)

through Scenario.(5)) 156

Table 5.13 The results of BKHA convergence scenarios (Scenario.(6)

through Scenario.(10)) 157

Table 5.14 The results of BKHA convergence scenarios (Scenario.(11)

through Scenario.(15)) 158

Table 5.15 The results of BKHA convergence scenarios (Scenario.(16)

through Scenario.(20)) 159

Table 5.16 The versions of the proposed basic krill herd algorithms

(BKHAs) for text document clustering problem (TDCP). 160 Table 5.17 Comparing the performance of the text document clustering

algorithms along with original datasets and proposed datasets

using accuracy measure values. 162

Table 5.18 Comparing the performance of the text document clustering algorithms along with original datasets and proposed datasets

using purity measure values. 163

Table 5.19 Comparing the performance of the text document clustering algorithms along with original datasets and proposed datasets

using entropy measure values. 163

Table 5.20 Comparing the performance of the text document clustering algorithms along with original datasets and proposed datasets

using precision measure values. 164

Table 5.21 Comparing the performance of the text document clustering algorithms along with original datasets and proposed datasets

using recall measure values. 164

Table 5.22 Comparing the performance of the text document clustering algorithms along with original datasets and proposed datasets

using F-measure values. 165

Table 5.23 The best results obtained by the basic clustering algorithms in

terms of the evaluation measures over all datasets. 167 Table 5.24 Versions of the proposed modified KH algorithm. 168 Table 5.25 Comparing the performance of the text document clustering

algorithms using the average accuracy measure values. 169 Table 5.26 Comparing the performance of the text document clustering

algorithms using the average purity measure values. 169 Table 5.27 Comparing the performance of the text document clustering

algorithms using the average entropy measure values. 169 Table 5.28 Comparing the performance of the text document clustering

algorithms using the average precision measure values. 171 Table 5.29 Comparing the performance of the text document clustering

algorithms using the average recall measure values. 171 Table 5.30 Comparing the performance of the text document clustering

algorithms using the average F-measure measure values and its

ranking. 172

Table 5.31 The average ranking of the modified clustering algorithms

based on the F-measure. i.e., lower rank value is the best method. 172 Table 5.32 The best results obtained by the modified clustering algorithms

in terms of the evaluation measures over all datasets. 174

Table 5.33 Versions of the proposed hybrid KH algorithm. 175

Table 5.34 Comparing the performance of the hybrid text document

clustering algorithms using the average accuracy measure values. 176 Table 5.35 Comparing the performance of the hybrid text document

clustering algorithms using the average purity measure values. 177 Table 5.36 Comparing the performance of the hybrid text document

clustering algorithms using the average entropy measure values. 178 Table 5.37 Comparing the performance of the hybrid text document

clustering algorithms using the average precision measure values. 178 Table 5.38 Comparing the performance of the hybrid text document

clustering algorithms using the average recall measure values. 179

Table 5.39 Comparing the performance of the hybrid text document

clustering algorithms using the average F-measure values. 179 Table 5.40 The average ranking of the hybrid clustering algorithms based

on the F-measure. i.e., lower rank value is the best method. 181 Table 5.41 The best results obtained by the hybrid clustering algorithms in

terms of the evaluation measures over all datasets. 181 Table 5.42 Comparing the performance of the multi-objective hybrid text

document clustering algorithms using the average accuracy

measure values. 185

Table 5.43 Comparing the performance of the multi-objective hybrid text document clustering algorithms using the average purity

measure values. 185

Table 5.44 Comparing the performance of the multi-objective hybrid text document clustering algorithms using the average entropy

measure values. 186

Table 5.45 Comparing the performance of the multi-objective hybrid text document clustering algorithms using the average precision

measure values. 186

Table 5.46 Comparing the performance of the multi-objective hybrid text document clustering algorithms using the average recall

measure values. 186

Table 5.47 Comparing the performance of the multi-objective text document clustering algorithms using the average F-measure

values and its ranking. 187

Table 5.48 The average ranking of the multi-objective clustering algorithm based on the F-measure. i.e., lower rank value is the best method. 188 Table 5.49 The best results obtained by the multi-objective clustering

algorithm in terms of the evaluation measures over all datasets. 188 Table 5.50 Comparison results between the krill herd-based methods

according to F-measure evaluation criteria. 191

Table 5.50 (Cont...) Comparison results between the krill herd-based

methods according to F-measure evaluation criteria. 192 Table 5.51 The average ranking of the krill-based algorithms based on the

average F-measure. i.e., lower rank value is the best method. 194 Table 5.52 Significance tests of the basic KH algorithms on the original

datasets and the proposed datasets usingt-test withα<0.05.

Highlight (bold) denote that result is significantly different. 195

Table 5.53 Significance tests of the basic KH algorithms and the modified KH algorithm usingt-test withα<0.05. Highlight (bold) denote

that result is significantly different. 195

Table 5.54 Significance tests of the modified KH algorithms and the hybrid KH algorithm usingt-test withα<0.05. Highlight (bold) denote

that result is significantly different. 196

Table 5.55 Significance tests of the hybrid KH algorithms and the

multi-objective hybrid KH algorithm usingt-test withα<0.05.

Highlight (bold) denote that result is significantly different. 196

Table 5.56 Key to the comparator methods. 197

Table 5.57 Description of text document datasets that used by the

comparative methods. 198

Table 5.58 A comparison of the results obtained by MHKHA and

best-published results. 199

Table 5.58 (Cont...) A comparison of the results obtained by MHKHA and

best-published results. 200

### LIST OF FIGURES

Page

Figure 1.1 Research methodology. 10

Figure 2.1 A flowchart of basic krill herd algorithm (Bolaji, Al-Betar,

Awadallah, Khader, & Abualigah, 2016). 15

Figure 2.2 A schematic represents the sensing domain around a KI (Bolaji

et al., 2016). 19

Figure 3.1 An example of the clustering search engine results by Yippy. 29

Figure 4.1 The research methodology stages 78

Figure 4.2 Sigmoid function used in binary PSO algorithm. 98

Figure 4.3 Representation of the Text Clustering Solution. 107 Figure 4.4 The flowchart of adapting basic KH algorithm to text document

clustering problem. 111

Figure 4.5 The flowchart of the modified krill herd algorithm. 115 Figure 4.6 The flowchart of the hybrid krill herd algorithm. 119 Figure 4.7 The flowchart of the multi-objective hybrid krill herd algorithm. 123

Figure 5.1 The general design of the experiments in the first stage. 134 Figure 5.2 The general design of the experiments in the second stage. 135

Figure 5.3 A snapshot of the dataset file 136

Figure 5.4 The experimental of the proposed methods in the first stage. 138 Figure 5.5 The number of features in each dataset using three feature

selection algorithms and dimension reduction techniques. 146

Figure 5.5(a) Dataset 1 146

Figure 5.5(b) Dataset 2 146

Figure 5.5(c) Dataset 3 146

Figure 5.5(d) Dataset 4 146

Figure 5.5(e) Dataset 5 146

Figure 5.5(f) Dataset 6 146

Figure 5.5(g) Dataset 7 146

Figure 5.5(h) Legend 146

Figure 5.6 Average computation time of the k-mean iteration (in second). 149 Figure 5.7 The average similarity document centroid (ASDC) values of the

basic text clustering algorithms for 20 runs plotted against 1000

iterations on seven text datasets. 166

Figure 5.7(a) Dataset 1 166

Figure 5.7(b) Dataset 2 166

Figure 5.7(c) Dataset 3 166

Figure 5.7(d) Dataset 4 166

Figure 5.7(e) Dataset 5 166

Figure 5.7(f) Dataset 6 166

Figure 5.7(g) Dataset 7 166

Figure 5.8 The average similarity document centroid (ASDC) values of the text clustering using modified KH algorithms for 20 runs

plotted against 1000 iterations on seven text datasets. 173

Figure 5.8(a) Dataset 1 173

Figure 5.8(b) Dataset 2 173

Figure 5.8(c) Dataset 3 173

Figure 5.8(d) Dataset 4 173

Figure 5.8(e) Dataset 5 173

Figure 5.8(f) Dataset 6 173

Figure 5.8(g) Dataset 7 173

Figure 5.9 The average similarity document centroid (ASDC) values of the text clustering using hybrid KH algorithms for 20 runs plotted

against 1000 iterations on seven text datasets. 183

Figure 5.9(a) Dataset 1 183

Figure 5.9(b) Dataset 2 183

Figure 5.9(c) Dataset 3 183

Figure 5.9(d) Dataset 4 183

Figure 5.9(e) Dataset 5 183

Figure 5.9(f) Dataset 6 183

Figure 5.9(g) Dataset 7 183

Figure 5.10 The average similarity document centroid (ASDC) values of the text clustering using multi-objective hybrid KH algorithm for

20 runs plotted against 1000 iterations on seven text datasets. 189

Figure 5.10(a) Dataset 1 189

Figure 5.10(b) Dataset 2 189

Figure 5.10(c) Dataset 3 189

Figure 5.10(d) Dataset 4 189

Figure 5.10(e) Dataset 5 189

Figure 5.10(f) Dataset 6 189

Figure 5.10(g) Dataset 7 189

### LIST OF ABBREVIATIONS

ABC Ant Colony Optimization

ASDC Average Similarity of Documents Centroid BCO Bee Colony Optimization

BKHA Basic Krill Herd Algorithm

BPSO Binary Particle Swam Optimization

CS Cuckoo Search

DDF Detailed Document Frequency DDR Detailed Dimension Reduction

DF Document Frequency

DFTF Document Frequency with Term Frequency DR Dimension Reduction

DTF Detailed Term Frequency FE Feature Extraction FF Fitness Function FS Feature Selection GA Genetic Algorithm

HKHA Hybrid Krill Herd Algorithm

HS Harmony Search

IDF Inverse Document Frequency

KH Krill Herd

KHA Krill Herd Algorithm KHM Krill Herd Memory KI Krill Individual

LFW Length Feature Weight

MKHA Modified Krill Herd Algorithm NLP Natural Language Processing NP Non-deterministic Polynomial-time PSO Particle Swam Optimization

TC Text Clustering

TD Text Document

TDCP Text document clustering problem

TF Term Frequency

TFSP Text feature selection problem VSM Vector Space Model

WTDC Web Text Documents Clustering

PEMILIHAN FITUR DAN ALGORITMA KRILL HERD LANJUTAN UNTUK PENGKLUSTERAN DOKUMEN TEKS

ABSTRAK

Pengklusteran dokumen teks adalah satu tren baru dalam galian teks di mana dokumen- dokumen diasingkan kepada beberapa kluster yang koheren, di mana dokumen-dokumen dalam kluster yang sama adalah serupa. Dalam kajian ini, satu kaedah baru untuk me- nyelesaikan masalah pengklusteran dokumen teks dijalankan dalam dua peringkat: (i) Satu kaedah pemilihan fitur menggunakan algoritma optima kumpulan partikel dengan satu skima pemberat yang baru dan satu teknik pengurangan dimensi yang lengkap di- cadangkan untuk mendapatkan satu subset baru fitur-fitur yang lebih bermaklumat de- ngan ruang berdimensi rendah. Subset baru ini digunakan untuk memperbaiki prestasi algoritma pengklusteran teks dalam peringkat berikutnya dan ini mengurangan masa pengiraannya. Algoritma pengklusteran min-k digunakan untuk menilai keberkesanan subset-subset yang diperolehi. (ii) Empat algoritma krill herd iaitu (a) algoritma krill herd asas, (b) algoritma krill herd yang telah diubahsuai, (c) algoritma krill herd hibrid, dan (d) algoritma hibrid pelbagai objektif krill herd, disarankan untuk menyelesaik- an masalah pengklusteran teks; algoritma ini adalah penambahbaikan lanjutan kepada versi-versi yang terdahulu. Untuk proses penilaian, tujuh set data teks penanda aras digunakan dengan pencirian dan kesukaran yang berbeza. Keputusan menunjukkan bahawa kaedah yang dicadangkan dan algoritma yang diperolehi mencapai keputusan terbaik berbanding dengan kaedah-kaedah lain yang diutarakan dalam literatur.

FEATURE SELECTION AND ENHANCED KRILL HERD ALGORITHM FOR TEXT DOCUMENT CLUSTERING

ABSTRACT

Text document (TD) clustering is a new trend in text mining in which the TDs are separated into several coherent clusters, where documents in the same cluster are similar. In this study, a new method for solving the TD clustering problem worked in the following two stages: (i) A new feature selection method using particle swarm optimization algorithm with a novel weighting scheme and a detailed dimension re- duction technique are proposed to obtain a new subset of more informative features with low-dimensional space. This new subset is used to improve the performance of the text clustering (TC) algorithm in the subsequent stage and reduce its computation time. The k-mean clustering algorithm is used to evaluate the effectiveness of the ob- tained subsets. (ii) Four krill herd algorithms (KHAs), namely, (a) basic KHA, (b) modified KHA, (c) hybrid KHA, and (d) multi-objective hybrid KHA, are proposed to solve the TC problem; these algorithms are incremental improvements of the preceding versions. For the evaluation process, seven benchmark text datasets are used with dif- ferent characterizations and complexities. Results show that the proposed methods and algorithms obtained the best results in comparison with the other comparative methods published in the literature.

### CHAPTER 1 INTRODUCTION

1.1 Background

With the growth of the amount of text information on Internet web pages and mod- ern applications, in general, interest in the text analysis area has increased to facili- tate the processing of a large amount of unorganized text information (Sadeghian &

Nezamabadi-pour, 2015).

Text clustering (TC) is an efficient unsupervised learning technique used to deal with numerous text documents (TDs) without any foreknowledge of the class label of the document (Prakash, Hanumanthappa, & Mamatha, 2014). This technique partitions a set of large TDs into meaningful and coherent clusters by collating relevant (similar) documents in the same cluster based on its intrinsic characteristics (Cobos et al., 2014).

The same clusters (groups) contain relevant and similar TDs. Meanwhile, different clusters contain irrelevant and dissimilar TDs (L. M. Abualigah, Khader, & Al-Betar, 2016a).

In the modern era, clustering is an important activity because of the size of text information on Internet web pages (Oikonomakou & Vazirgiannis, 2010). Clustering is used to determine relevant TDs and facilitate TD display by groups that share the same pattern and contents (Cobos et al., 2014). The TC technique is successfully utilized in many research areas to facilitate the text analysis process, such as data mining, digital forensics analysis, and information retrieval (Forsati, Mahdavi, Shamsfard, &

Meybodi, 2013).

Vector space model (VSM) is the most common model used in TC to represent each document; in this model, each term in the TDs is a feature (word) for document representation (Salton, Wong, & Yang, 1975; Yuan, Ouyang, & Xiong, 2013). The TDs are represented by a multi-dimensional space, in which the position value of each dimension corresponds to a term frequency (TF) value. The text features generated from different text terms, even in a small document, would be represented by hundreds and/or thousands of text features. Thus, TDs will have high-dimensional informative and uninformative features (i.e., irrelevant, redundant, unevenly distributed, and noisy features). These uninformative features can be eliminated using the feature selection (FS) technique (Bharti & Singh, 2016b; L. Zheng, Diao, & Shen, 2015).

FS techniques are nondeterministic polynomial time-hard optimization methods used to determine the optimal subset of informative text features and improve the per- formance of the TC method while maintaining the necessary text information (Bharti

& Singh, 2016b; K.-C. Lin, Zhang, Huang, Hung, & Yen, 2016). Typically, these techniques are performed even without any foreknowledge of the class label of the document. Conventionally, these techniques are divided into three main types, namely, FS based on document frequency (DF), FS based on TF, and hybrid feature technique based on DF and TF (Y. Wang, Liu, Feng, & Zhu, 2015). Several text-based stud- ies rely on FS methods, such as TC (L. M. Abualigah, Khader, & Al-Betar, 2016b), text classification (Z. Zheng, Wu, & Srihari, 2004), and data mining (K.-C. Lin et al., 2016). Recently, metaheuristic algorithms have been successfully used in the area of text mining to solve the text document clustering problems (TDCPs) and text feature

selection problems (TFSPs) (BoussaïD, Lepagnot, & Siarry, 2013).

The application of FS techniques produces a new subset with numerous informa- tive text features. However, the dimensionality is still high because all dimensions remain even after removing the uninformative features. The dimensional space of this subset must be reduced further to facilitate the TC process (Lu, Liang, Ye, & Cao, 2015). High-dimensional feature space has become a significant challenge to the TC domain because it increases the computational time while decreases the efficiency of TC techniques (van der MLJP & van den HH, 2009). Thus, a dimension reduction (DR) technique is necessary to produce a new low-dimensional subset of useful fea- tures (Diao, 2014; Esmin, Coelho, & Matwin, 2015; Sorzano, Vargas, & Montano, 2014a). This technique will reduce the computation time and improve the perfor- mance of the TC algorithm. The DR technique should eliminate useless text features;

eliminate unnecessary, redundant, and noisy text features; preserve intrinsic informa- tion; and significantly reduce the dimension of the text feature space (Bharti & Singh, 2014b; Raymer, Punch, Goodman, Kuhn, & Jain, 2000).

1.2 Motivation and Problem Statement

Recently, unorganized TDs on Internet web pages and modern applications have in-
creased exponentially, and the number of Internet users in the world has exceeded
three billion^{1}. These users face difficulties in obtaining the information that they need
easily and neatly (Bharti & Singh, 2014b; U˘guz, 2011). The process of managing such
a large TD is called TD clustering technique, which transforms a set of large unor-
ganized TDs into coherent and similar groups, that is, clusters, which facilitate user

1https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users

browsing and searching for information. From the literature on TC techniques, four main problems are identified and explained as follows:

First, TDs usually contain informative and uninformative text features. Uninforma- tive features can confuse and mislead the TD clustering algorithm, thereby reducing the performance of the clustering algorithm (Bharti & Singh, 2014b, 2015b). Therefore, identifying and removing these uninformative text features can improve the perfor- mance of the clustering algorithm and reduce the computation time. The main draw- back of these methods is focusing only on selecting a new subset of text features that rely on the existing weighting scheme (score) (Bharti & Singh, 2016b). The current weighting scheme has certain weaknesses in evaluating the features by computing the weight score for all features equally using one main factor (i.e., term frequency). Thus, the distinction between informative and uninformative text features over the document is insufficient (Ahmad, Abu Bakar, & Yaakub, 2015; Bharti & Singh, 2016b; Cobos, León, & Mendoza, 2010a; Moayedikia, Jensen, Wiil, & Forsati, 2015). The weighted score should be more accurate to facilitate the process of the text FS technique, where it plays the main role in the FS procedure by distinguishing between TD features by providing a high score to the more informative features.

Second, the high-dimensional feature space is one of the most critical weaknesses of TDs because it influences the process of TD clustering techniques by increasing the execution time and decreasing the performance of the TD clustering algorithm (Bharti & Singh, 2014b; Nebu & Joseph, 2016). The high-dimensional feature space contains the necessary (useful) and unnecessary (useless) text features. Thus, the DR technique reduces the dimensional feature space by pruning useless text features to

improve the performance of the clustering algorithm. One of the possible ways to solve the high-dimensional feature space is the DF method. This method deals with the reduction process with fixed roles (DF of the feature) in making the decision to prune useless text features (Esmin et al., 2015; Tang, Shepherd, Milios, & Heywood, 2005; Yao, Coquery, & Lê Cao, 2012a). The fundamental premise of the DF method is impractical because the frequently occurring features are considered more important in the documents than the infrequently occurring features (Bharti & Singh, 2015b).

Third, the main advantage of the TC algorithm is its effectiveness in guarantee- ing access to the accurate clusters. Over the past few years, a large proportion of researchers in the TC domain applied metaheuristic algorithms to solve the TDCPs.

However, a major drawback of these algorithms is that it provides a good exploration of the search space at the cost of exploitation (Bharti & Singh, 2016a). Other problems are related to unsatisfactory outcomes, such as inaccurate clusters, and the behavior of the algorithms that were selected is inappropriate for the problem of the TC instances (Bharti & Singh, 2015a; Binu, 2015; Forsati, Keikha, & Shamsfard, 2015; G.-G. Wang, Gandomi, Alavi, & Deb, 2015). All available TC techniques based on metaheuristic algorithms still face these problems. Solving the TC problem using metaheuristic algo- rithms still need more in-depth investigation for several important reasons (Y. Guo, Li,

& Shao, 2015; Mohammed, Yusof, & Husni, 2015; J. Wang, Yuan, & Cheng, 2015).

However, these reasons can be justified by the “no free lunch” theorem (Wolpert, 2013;

Wolpert & Macready, 1997).

Fourth, the core effectiveness of the TD clustering techniques relies on the similar- ity and distance functions of the TC algorithm. These functions are used in making the

decision to partition the document into an appropriate cluster based on the similarity or distance value; these decisions affect the performance of the TD clustering algorithm (Rao, Ramakrishna, & Babu, 2016). Similarity and distance measurements are stan- dard function criteria used in the TD clustering domain as an objective function. Nev- ertheless, the results of these measurements are different and lead to certain challenges because of the variance between the values of similarity and distance measures for the same document (L. M. Abualigah, Khader, & Al-Betar, 2016a; Forsati et al., 2013).

Determining the appropriate objective function to deal with the large TDs is difficult (Mukhopadhyay, Maulik, & Bandyopadhyay, 2015; Mukhopadhyay, Maulik, Bandy- opadhyay, & Coello, 2014). Multi-objective functions (multiple-criteria decision mak- ing) are currently used in several domains as an alternative technique to yield better results (George & Parthiban, 2015; Saha, Ekbal, Alok, & Spandana, 2014). How- ever, for the TD clustering technique, multiple-criteria decision making is relatively unknown.

1.3 Research Objectives

The overall aim of this study is to develop an effective TD clustering method. The main objective is to show that the improved method can outperform the other comparative methods. This research has the following objectives:

• to find the best features:

– to enhance the weight score of the terms for the text FS technique in order to improve the TD clustering;

– to improve the text FS technique for finding a new subset of more informa- tive features to improve the TD clustering;

– to reduce the dimension of the feature space in the form of a low-dimensional subset of useful features to improve the TD clustering;

• to improve the text document clustering using krill herd algorithm:

– to increase the effectiveness of the TD clustering technique and to reduce its errors;

– to improve the global search ability and its speed of convergence;

– to enhance the quality of initial solutions obtained by the local search strat- egy;

– to increase the likelihood of obtaining an accurate decision (similarity value) between the document and clusters centroids in the k-mean clustering al- gorithm.

1.4 Contributions

After the research objectives are achieved, this study will have the following main contributions:

1. Introduced a new weighting scheme to provide a significant influence score for the informative text features within the same document. This scheme focuses on assigning a favorable term weight to facilitate the text FS technique and distin- guishes among the features of the clusters by giving a high weight to essential features in the same document.

2. Adapted metaheuristic optimization algorithms (i.e., genetic algorithm (GA), harmony search (HS), and particle swarm optimization (PSO)) to find the best features at the level of each document using a new FS method.

3. Introduced a new detailed DR technique to reduce the dimensional space of text features based on the detailed term frequency (DTF) and detailed document fre- quency (DDF) of each feature compatible with the size of its effect on the docu- ment. The DDF of each feature at the level of all documents is compatible with the size of its effect on the documents in partnership with its DTF value.

4. Adapted the basic krill herd algorithm (BKHA) and tuning its parameters for the text document clustering problem.

5. The modified krill herd algorithm (MKHA) to improve the global search ability.

These modifications occur during ordering of the basic KH operators where the crossover and mutation processes are invoked after updating the positions of the krill herd algorithm (KHA).

6. The hybrid krill herd algorithm with the k-mean algorithm (HKHA) as a new operator, which plays a basic role in the MKHA to improve the local search ability. Hybridization is used to enhance the capacity of the KHA for finding locally optimal solutions by taking the refining power of the k-means clustering algorithm.

7. Introduced a multi-objective function based on the local best concept for the k- mean algorithm to enhance the capacity of the KHA by achieving an accurate local search, called multi-objective hybrid krill herd algorithm (MHKHA).

1.5 Research Scope

This study covers the main TC preprocessing steps (i.e., text FS and DR techniques) and the metaheuristic algorithms (i.e., different versions of the proposed KHA) to deal with the TDCP. The methods proposed in this study are applied to a large amount of TDs as electronic pages (i.e., newsgroup documents appearing on newswires, In- ternet web pages, and hospital information), modern applications (technical reports and university data), and biomedical sciences (large biomedical datasets). Note, all the datasets used in this research have been written in English language. These TDs (datasets) are characterized by high-dimensional informative and uninformative text features (Bharti & Singh, 2014b, 2015b; L. Zheng et al., 2015). All of the proposed methods need the number of clusters as input parameter K. Determining the correct number of clusters for the given TD datasets is an important issue because the number of document clusters is an essential parameter in TC problems. Standard TD datasets with different sizes (i.e., number of documents, number of terms, and number of clus- ters), constraints, and complexities are used in the TC technique to evaluate the pro- posed methods.

1.6 Research Methodology

This section briefly discusses the stages of the research methodology, which are applied to achieve the research objectives for improving the TD clustering technique, as shown in Figure 1.1. The detailed description is provided in Chapter 4.

The first stage is modeling and adapting GA, HS, and PSO to solve the text FS problem (TFSP) with the novel weighting scheme and detailed DR technique. This

Figure 1.1: Research methodology.

stage facilitates the TC task to deal with a low-dimensional subset of informative text features, which reduce the computation time and improve the performance of the TD clustering algorithm.

The second stage is adapting the basic KH algorithm (BKHA) and tuning its pa- rameters to solve the text DC problem (TDCP). Then, three versions of the BKHAs are modified (MKHAs) to improve the global (exploration) search ability. The three versions of the HKHA with the k-mean algorithm (MKHAs) are used to increase the performance of the TC technique by improving the local (exploitation) search ability.

These hybrid versions used the results of the k-mean algorithm as the initial solutions in KHA to ensure balance between local exploitation and global exploration. Finally, a multi-objective function is applied to obtain an accurate TC technique by combining two standard measures (i.e., cosine similarity and Euclidean distance measurements).

The multi-objective function is the primary factor used to obtain an effective clustering method by deriving an accurate similarity value between the document and the cluster centroid.

1.7 Thesis Structure

The rest of this thesis organized as follows:

Chapter 2 (Krill Herd Algorithm): This chapter discusses the principles of the KHA. The analogy between the clustering technique and the optimization terms is provided. The steps of the KHA are described in detail.

Chapter 3(Literature Review): This chapter provides an overview of the text pre- processing steps, TFSPs, and TDCPs with particular attention to TDs. This chapter also examines several methods used to deal with TFSP and TDCP. This chapter also presents a review of KHA in the areas of applications, modifications, and hybridiza- tions across many fields.

Chapter 4(Proposed Methodology): This chapter illustrates the modeling of TFSP and TDCP. This study also includes a comprehensive description of the adapted re- search methodology, including different weight schemes, metaheuristic algorithms for text FS, DR techniques, and KHAs for TD clustering, and the sequence of the proce- dures conducted.

Chapter 5(Experimental Results): This chapter shows the experiments and results of all the proposed methods and presents the comparisons of each method with the others.

Chapter 6(Conclusion and Future Work): This chapter provides the research con- clusion and possible future works.

### CHAPTER 2

### KRILL HERD ALGORITHM

2.1 Introduction

Krill herd (KH) algorithm has a unique behavior to solve the text clustering problem.

This algorithm was introduced by Gandomi and Alavi in the year 2012 to solve global optimization functions (Gandomi & Alavi, 2012). This section presents the modeling of the basic-krill herd algorithm (KHA) for the TDCP (L. M. Abualigah, Khader, Al- Betar, & Awadallah, 2016).

2.2 Krill Herd Algorithm

Krill herd (KH) is a swarm intelligence (SI) search algorithm based on the herding be- havior of krill individuals (KIs). It is a population-based approach consisting of a huge number of krill, where each krill individual (KI) moves through a multi-dimensional space to search for close food and high-density herd (swarm). In KH as optimization algorithm, positions of KIs are considered as various design variables and the distance of the KI from the food is the objective function (Gandomi & Alavi, 2012; Mandal, Roy, & Mandal, 2014). The KH algorithm is considered in three categories: (1) Evo- lutionary algorithms (2) Swarm intelligence (3) Bacterial foraging algorithm (Bolaji et al., 2016).

2.3 Why the KHA has been Chosen for Solving the TDCP

The KH is a suitable algorithm for the TC technique according to: (i) the similarities between the behavior of the KHA and the behavior of the TD clustering technique, (ii) KH algorithm obtained better results in solving many problems in comparison with others common algorithms published in the literature.

The compatibility between KHA and TC involves searching for the closest food (closest centroid) and high density groups (similar groups) (Bolaji et al., 2016). Den- sity is one of the main factors that influence the success of all the algorithms used to achieve coherence and similar groups. If documents in the same cluster are relevant, then density is high, and vice versa. If the KIs are close to the food, then density is high, and vice versa. Thus, the behavior of KIs is exactly the same as that of the TD clustering technique (both of them are a swarm).

With regard to the KHA, each KI (document) moves toward the best solution by searching for the herd (group) with high density (similar groups) and the closest food (closest centroid). These factors are used as objectives to lead each krill to an optimal herd around the food. With regard to the TC, each document moves toward the best solution by searching for the similar cluster centroid and the cluster with a high density.

Moreover, these factors are used as objectives to lead each document to an optimal cluster around the closest centroid. The relationship between the behavior of KHA and the behavior of TD clustering is considered a strong feature in applying KHA to solve the TDCP.

2.4 Krill Herd Algorithm: Procedures

Due to the nature of this research, predation disperses KIs, leads to a decrease of the average krill density and distances of the KH from the food location. This process is the initialization phase in the KH algorithm. In the natural system, the objective function of each document is supposed to be the distance or similarity from the cluster centroid. The fitness function of each candidate solution is the total distance or simi- larity between all documents with clusters centroid. The KH algorithm has three main motion calculation to update individual positions; then it applies the KH operators, which is inspired by the evolutionary algorithm. The procedures sequence of the basic KH algorithm is shown in Figure 2.1.

Figure 2.1: A flowchart of basic krill herd algorithm (Bolaji et al., 2016).

2.4.1 Mathematical Concept of Krill Herd Algorithm

The KH algorithm has three main steps to update the time-dependent position of each KI as follows:

• Movement induced by the presence of other KIs: only individual neighbors in the visual field that affects the KI moving.

• Foraging activity: the KIs search for food resources.

• Random diffusion: the net movement of each KI based on density regions (Gan- domi & Alavi, 2012).

Thei_{th}individual position is updated by the following Lagrangian model using Eq.

(2.1).

dx_{i}

dt =N_{i}+F_{i}+D_{i}, (2.1)

where for the krill i, N_{i} is the motion effect of the i_{th} individual from other KIs.

This value is estimated from the local swarm density, a target swarm density, a repul-
sive swarm density, and the target direction which is effected by the best KI.F_{i} is the
foraging motion for thei_{th} KI. This value estimated from the food attractiveness, food
location, the foraging speed, the last foraging action or movement and the best fitness
of thei_{th} krill so far. D_{i}is the physical diffusion for thei_{th} KI, where this value esti-
mated from two factors: the maximum diffusion speed of the KIs and random direction
(Gandomi, Talatahari, Tadbiri, & Alavi, 2013).

2.4.1(a) Movement Induced by other Krill Individuals

Movement induced is an illusion of visual perception in which a moving individual appears to move differently because of neighbors moving nearby in the visual field.

Theoretically, individuals try to keep the high density (Bolaji et al., 2016; G. Wang et al., 2014). The direction of movement induced is defined by Eq. (2.2).

N_{i}^{new}=N^{max}α_{i}+ω_{n}N_{i}^{old}, (2.2)

where for krill i,N^{max} is the parameter for tuning the movement induced by other
individuals, it is determined experimentally (see Table 5.11). α_{i} is estimated from the
local swarm density by Eq. (2.3),ω_{n}is the inertia weight of the movement induced by
other individuals’ in range [0, 1], andN_{i}^{old} is the last change or movement produced.

α_{i}=α_{i}^{local}+α_{i}^{target}, (2.3)

where, theα_{i}^{local} is the effect of the neighbors ini_{th} individual movement,α_{i}^{target}
is the target direction effected by the j_{th}KI. The effect of individual neighbors can be
considered as an attractive or repulsive tendency between the KIs for a local search
while the normalized values can be positive or negative (Bolaji et al., 2016; Gandomi

& Alavi, 2012). Theα_{i}^{local} is calculated by Eq. (2.4).

α_{i}^{local}=

n

## ∑

j=1Kb_{i,j}xb_{i,}_{j}, (2.4)

where, Kb_{i,}_{j} is the normalized value of the objective function vector for thei_{th} KI.

bx_{i,}_{j}is the normalized value of the related positions for thei_{th}KI. TheKb_{i,j} is calculated

by Eq. (2.5):

Kb_{i,}_{j}= K_{i}−K_{j}

K^{worst}−K^{best}, (2.5)

where, K_{i} is the objective function of i_{th} KI, K_{j} is the objective function of j_{th}
neighbor (j=1,2, ...,n). nis the number of all KIs,K^{best} andK^{worst} are the best and
worst objective function values ofi_{th} individual. Thexb_{i,}_{j}is calculated by Eq. (2.6).

bx_{i,j}= x_{j}−x_{i}
x_{j}−x_{i}

+ε

, (2.6)

where, x_{i}is the current position,x_{j}is the position of j_{th} neighbor,||x_{j}−x_{i}||is the
vector normalization, it is used for calculating the neighbors of thei_{th}KI by Eq. (2.7),
ε is a small positive number to avoid singularities (Jensi & Jiji, 2016; Mandal et al.,
2014). The sensing distance is calculated by Eq. (2.7).

de_{i}= 1
5n

n j=1

## ∑

x_{i}−x_{j}

, (2.7)

where,de_{i}is the sensing distance for the krilli. Note, if the distance value between
two KIs is less than the current value, they are neighbors. Figure 2.2 illustrates the
movement of the KIs and their neighbors.

The known target vector of each KI is the highest objective function. The effect of
the best fitness on the j_{th} individual is calculated by Eq. (2.8). This procedure allows

**Sensing Distance**

**Neighbor 3**

**Neighbor 1**

**Neighbor 2**

Figure 2.2: A schematic represents the sensing domain around a KI (Bolaji et al., 2016).

the solution to move towards the current best solution and is calculated by Eq. (2.8).

α_{i}^{target} =C^{best}Kb_{i,best}bx_{i,best}, (2.8)

where,

C^{best} =2

rand+ I
I_{max}

, (2.9)

C^{best} is the coefficient of individuals,Kb_{i,best} is the best objective function of thei_{th}
KI,bx_{i,best} is the best position of thei_{th}KI,randis a random number between [0, 1] for
improving the local exploration;Iis the current iteration number;I_{max}is the maximum
number of iterations (Gandomi & Alavi, 2012).

2.4.1(b) Foraging Motion:

The foraging motion of KIs is estimated by two effects, namely, current food and old
food location (L. M. Abualigah, Khader, Al-Betar, & Awadallah, 2016; Bolaji et al.,
2016; Mandal et al., 2014). Food area or location is defined to attract KIs to the global
optima possibly. The foraging motion fori_{th} individual is expressed by Eq. (2.10).

F_{i}=V_{f}β_{i}+ω_{f}F_{i}^{old}, (2.10)

where, V_{f} is the parameter for tuning the foraging speed, it is determined exper-
imentally (see Table 5.11), β_{i} is the food location of the i_{th} KI by Eq. (2.11), ω_{f} is
the inertia weight of the foraging speed in range [0, 1], and F_{i}^{old} is the last foraging
motion.

βi=β_{i}^{f ood}+β_{i}^{best}, (2.11)

where, β_{i}^{f ood} is the food attractiveness of thei_{th} KI, it is calculated by Eq. (2.12).

β_{i}^{best} is the best objective function of thei_{th} KI.

β_{i}^{f ood} =C^{f ood}Kb_{i,f ood}bx_{i,}_{f ood}, (2.12)

where,

C^{f ood} =2

1− I
I_{max}

, (2.13)

Kb_{i,f ood} is the normalized value of the objective function of the i_{th} centroid and
bx_{i,}_{f ood}is the normalized value of thei_{th}centroid position. The center of the individual’s
food for each iteration is calculated by Eq. (2.14).

x^{f ood}= ∑^{n}_{i=1}_{K}^{1}_{i}x_{i}

∑^{n}_{j=1}_{K}^{1}_{j}

, (2.14)

where, nis the number of the KIs,K_{i}is the objective function of thei_{th} KI, andx_{i}
is thei_{th}position value. The effect of the best objective function of thei_{th}KI is handled
by using Eq. (2.15).:

β_{i}^{best}=Kb_{i,ibest}

bx_{i,ibest}, (2.15)

where,Kb_{i,best} is the best previous objective function of thei_{th} KI,bx_{i,}_{f ood} is the best
previous visited food position of thei_{th}KI. The movement induced by other individuals
and the forging movement decrease with the increase in the time (iterations).

2.4.1(c) Physical Diffusion:

Physical diffusion is the net movement of each KI from a region of high density to a region of low density or vice versa. The better position of the KI is the less random direction. Physical diffusion values of individuals are estimated by two effects, namely,

maximum diffusion speed (D_{m}) and random directional vector (δ) (L. M. Abualigah,
Khader, Al-Betar, & Awadallah, 2016; Gandomi & Alavi, 2012; Jensi & Jiji, 2016;

G. Wang et al., 2014). Physical diffusion for thei_{th} KI is determined by Eq. (2.16).

D_{i}=D^{max}

1− I
I_{max}

δ, (2.16)

where, D^{max} is the parameter for tuning the diffusion speed, it is determined ex-
perimentally (see Table 5.11), and δ refers to the array that contains random values
between [-1, 1]. I is the current iteration,I_{max} is max number of iterations.

2.4.1(d) Updating the Krill Individuals:

The movement of thei_{th} KI is influenced by the other KIs, foraging motion, and phys-
ical diffusion. These factors seek to obtain the best objective function for each KI.

The foraging movement and the movement induced by other KIs include two global and two local strategies. These strategies are working in parallel to make KH a robust algorithm (Bolaji et al., 2016; Gandomi & Alavi, 2012; G. Wang et al., 2013). The individual positions updated towards the best objective function by Eq. (2.17).

x_{i}(I+1) =x_{i}(I) +∆tdx_{i}

dt , (2.17)

where,

∆t=C_{t}

n

## ∑

j=1

(U B_{j}−LB_{j}), (2.18)

∆tis an important and sensitive constant computed by Eq. (2.18), andnis the total
number of individuals. LB_{j} is the lower bound, U B_{j} is the upper bounds of the ith
variables(J=1,2, ....,n), andC_{t} is a constant value between [0, 2]. It works as a scale
factor of the speed vector.

2.4.2 The Genetic Operators

Genetic algorithm (GA) is a stochastic meta-heuristic search method for the global solution in a large search space. This algorithm is inspired by the classical evolutionary algorithms (EA). The genetic operators encoded in a genome that performed in an unusual way that permits asexual reproduction that leads to the offspring. However, the sexual reproduction can swap and reorder chromosomes, giving birth to offspring which includes a cross breeding of genetic information from all parents. This operation is often called a crossover, which means swapping of the genetic information. To avoid premature convergence, the mutation operator is used to increase the diversity of the solutions (H. Chen, Jiang, Li, & Li, 2013; G.-G. Wang, Gandomi, & Alavi, 2014b).

Genetic operators are incorporated into the KH algorithm to improve its performance (Bolaji et al., 2016; Gandomi & Alavi, 2012).

2.4.2(a) Crossover Operator of KH Algorithm:

The crossover operator is an effective procedure for global solutions. This procedure is controlled by a probabilityCr by generating a uniformly distributed random value

between [0, 1] (G.-G. Wang, Gandomi, & Alavi, 2014b). Themthcomponent ofx_{i,m}
is determined as the following:

x_{i,m}=

x_{p,m}, ifrand<Cr
x_{q,m} else

(2.19)

.

Cr=0.2Kb_{i,best}, (2.20)

.

where, the crossover probability is determined by Eq. (2.19). pandqrefer to the
two solutions which are chosen for the crossover operator, p,q∈ {1,2, ....,i−1,i+
1, ....,n}, the Cr increases with decreasing fitness function, Kb_{i,best} = K_{i}−K^{best}; K_{i} is
the objective function value of thei_{th} KI, andK^{best} is the best objective function value
of thei_{th}KI.

2.4.2(b) Mutation Operator of KH Algorithm:

The mutation operator is an effective strategy for a global solution. This strategy is controlled by a probabilityMu(G. Wang et al., 2014). The mutation operator is deter- mined as the following:

x_{i,m}=

x_{gbest,m}+µ(xp,m−x_{q,m}), ifrand<Mu

x_{i,m}, else

(2.21)