216

### CHAPTER VIII

## CONCLUSION

8.1 INTRODUCTION

The primary problems in this research is the high computational cost of the traditional agglomerative hierarchical clustering methods, human is not include in data exploration process and some knowledge and observation have been missed by the data mining algorithms. This research improved the complexity time and reduced number of steps to cluster all data points of the traditional agglomerative hierarchical clustering methods. It reduces the gap between the flooding of information and the traditional hierarchical clustering algorithms by proposed bidirectional agglomerative hierarchical clustering algorithm. It also involves the users in the data exploration process to analyze and observe large amount of data to grasp valuable knowledge based on AVL tree approach to visualize the knowledge and used existence tools to visualize the data mining steps.

This chapter reviews the whole development milestones and serves as a concluding note to this research. It reviews back all the finding found from this research by giving a holistic view based on the research objectives and describes the achievement of the study's objectives. This chapter includes constraints and limitations encountered during the development of this research. It also lists the contributions of this research.

The next section of this chapter highlights possible enhancements and recommendations for future work. Finally, ends with summary of the research.

### 8.2 THE ACHIEVEMENTS OF THE STUDY'S OBJECTIVES

As have been describing in the introduction chapter, the aims of this study is to propose and develop a bidirectional agglomerative hierarchical clustering algorithm to enhance the complexity time of the traditional hierarchical algorithms and reduce the gap between the flooding of information and the traditional hierarchical clustering algorithms. This study combines traditional data mining algorithms with information visualization techniques to utilize the advantages of both approaches through develop

217

model for visualizing the knowledge extracted by proposed algorithm based on AVL tree approach. The objectives in this research and the findings related to each objective are outlined below:

Research Objective 1: To propose a bidirectional agglomerative hierarchical clustering algorithm that reduces the complexity time of the conventional hierarchical method in order to handle huge amount of data more effectively based on A VL tree approach.

From the results of the manual analysis of applying bidirectional agglomerative hierarchical clustering using single-link method on Malaysian states example. It is considering if the number of objects are n, there are n/2 (in the best case) or n/2+1 (in the worst case) levels. Bidirectional algorithm in each level involves finding a minimum from tree T with time complexity 0(1), then merge two clusters into a single cluster it need 0(1), finally update the proximity tree, T, by deleting the nodes corresponding to clusters its need O(logn). Results are discussed in Chapter 5.

Research Objective 2: To evaluate the performance of the proposed algorithm compared to the conventional approach.

The evaluation of the bidirectional agglomerative hierarchical clustering algorithm was into two stages:

" Manual analysis: Results are discussed in Chapter 5.

" Comparing the performance of the proposed algorithm with previous algorithm. Results are discussed in Chapter 6

The results demonstrated that the analysis of the proposed bidirectional

algorithm reduced the computational Colllplexlty. The results also showed reduce the number of call to the data structure that need to merge all elements in one cluster into (n/2)+1 in the worst case and into (11/2) in the best case, by finding the median of distances and cluster the objects in left and right side of the median. Therefore, this approach is more efficient in clustering huge amount of data and the performance of the bidirectional algorithm is better from the traditional clustering algorithms. Results are discussed in Chapter 5 and Chapter 6.

218

Research Objective 3: To integrate the proposed algorithm with Visualization techniques to user involvement based on A VL tree approach.

From the results of the experiment, revealed that all dimensions charted higher than the midpoints of their respective scales. Using visualization prototype of bidirectional of agglomerative hierarchical clustering algorithm can be seen highly effective modality for understanding clustering process. It enables the users to stay aware of what is happening in and around their environments from the huge amounts of data and interactive between the user and the data. It also catches and observes knowledge's that have been missed by data mining algorithms. Results are discussed in Chapter 7.

8.3 CONSTRAINTS AND LIMITATIONS

Throughout the prototype application process, there were some limitations and challenges. Below are some of the limitations and challenges in the prototype design and prototype development stages were significantly these are the tasks that have allocated most of the resources.

" Limited structure for tree application. since the tree such as AVL or Binary Search Tree. it is impossible to display more details in small node and elaborate more information.

" Limited structure for each node in an AVL tree to insert a new duplicate node.

Therefore the researcher manages this difficulty by changing the duplicate node and inserts it again in the tree.

" Time and efforts spent on the design of the "look and feel" of this tree application.

" Time and efforts spent on the design of the structure for the tree application, the content and text inputs of the tree.

" Learnt and familiarized with the new programming language, to develop this proposed bidirectional agglomerative hierarchical clustering algorithm prototype application since the researcher has limited knowledge and experience with this open source programming language.

" Time and efforts spent on modules testing in order to get the functionality of

the Initial prototype

### application working.

219

### 8.4 CONTRIBUTION OF STUDY

The major contributions of this study can be summarized as follows:

(a) The first contribution in this research is proposed Bidirectional Agglomerative Hierarchical Clustering Algorithm using AVL tree to enhance the complexity time for analyzing huge amount of data and handle the flooding of information. It also fills the gap between the growing size and number of databases and data sets and the information analysis techniques.

(b) The second contribution is used of visual cluster approach to visualize the knowledge extracted by the data mining algorithm with AVL tree. This research through visual data mining process and visual the knowledge extracted by data mining algorithm it included the human in the data exploration process to get knowledge from data, through integration of visualization techniques and data mining algorithms with the intuitive power of the human mind to deal with the flood of information, also to assist data miners in cluster analysis of large number of databases and datasets.

(c) The third contribution in this research is the proposed bidirectional agglomerative algorithm with the AVL tree, to integrate the human mind's exploration abilities with the enormous processing power of computers to form a powerful knowledge discovery environment that capitalizes on the best of both worlds through visualization. Therefore this research used AVL tree for two purposes. These two purposes are reduced the high computational cost and visualize the knowledge extracted by the proposed algorithm at the same time.

### 8.5 RECOMMENDATIONS FOR FUTURE WORK

The work in this research has open up a new research and a lot of areas enhancement that can be made on the field of the algorithm development in hierarchical clustering data mining algorithms, visualization techniques and methodology. Basically, future works can be made from hierarchical data mining algorithms, visualization and methodology.

220

8.5.1 Agglomerative hierarchical clustering data mining algorithm

" This research proposed bidirectional agglomerative hierarchical clustering algorithm based on AVL tree approach to reduce the complexity time for analyzing the datasets and handle the flooding of information, whilst most of the agglomerative algorithms discussed in algorithm background Chapter and found in the literature, work implicitly or explicitly with the nxn similarity matrix such that (i, j) element of the matrix represents the similarity between i th and j th data items. It used distance matrix to represents the distances between all possible data items (pairs) of clusters. To find the least dissimilar pair of clusters using distance matrix need 0(n2) time complexity. However the complexity of the proposed bidirectional agglomerative hierarchical clustering algorithm based on distance matrix is O(n2), but it reduced the number of calling to the data structure that merge all elements in one cluster into (n/2)+1 in the worst case and into (n/2) in the best case, by finding the median of distances and cluster the objects in left and right side of the median.

Therefore, using or improving suitable data reduction techniques can reduce the data points in the distance matrix thus will reduce the computational complexity of the bidirectional agglomerative hierarchical clustering algorithm based on distance matrix approach.

" Study of the quality of the bidirectional agglomerative hierarchical clustering algorithm with AVL tree approach compared with agglomerative hierarchical clustering algorithm and other classification methods. This study only focused on reducing the complexity time not on the quality of the cluster.

8.5.2 Visualization

" From the evaluators comments, the visualization prototype of bidirectional

agglomerative hierarchical clustering algorithm can be designed further to include additional visualization tools. and improve on zooming to include informative zooming where more information is obtained as the user increase the zooming.

This research combines traditional data mining algorithms with information visualization techniques to utilize the advantages of the two approaches.

Therefore, it can handle and cover the massive amount of data and information

221

flooding. This study used AVL tree for two puposes. These two purposes are reduced the high computational cost and visualized the knowledge extracted by data mining algorithm. Therefore we suggest use AVL tree to visualize the data mining steps through visual target data, data cleaning, data integration.

data selection and data transformation instead used existence techniques to visualize data mining steps.

5.5.3 Methodology

" This study has been proposed research process framework to solve problem statement. achieve objectives and answer research questions. The phases and processes for research process framework (as previously) explained (earlier) in Chapter 4. Even though the efficiency and the effectiveness of using research process framework in this research. However. since the research process framework is not evaluated, we cannot argue that the research process framework is suitable and optimal for other algorithm development researches and integrated data mining and visualization either in another research.

Therefore. a further investigation to evaluate the research process framework is needed to prove and accept it among the experts in Data Mining Research Design.

8.6 SUMMARY

Clustering is an analysis technique for discovering interesting distributions and patterns in the data set. The main goal of clustering is to divide objects into well separated groups in if way that objects lying in the same group are more similar to each other than to objects in other groups. These groups are discovering the natural groupings of a set of patterns. points. or objects.

Ile agglomerative algorithm tools aim at knowledge discovery from databases and datasets by merging similar attributes from large amounts of databases. by identifying relationship,, among data points or elements. These new relationships and information can assist users in effective problem structuring and resolution. The proposed algorithm promises its users the ability to categorize. Prioritize- understand and compare data. and utilize the meaning of any particular data automatically skipping tedious searching. browsing and reading.

ýýý

This research has proposed a bidirectional agglomerative hierarchical clustering algorithm to enhance the complexity time for analyzing huge amount of data and handling the flooding of information. Therefore, by using a bidirectional

agglomerative hierarchical clustering algorithm, the complexity time can be significantly reduced than previous methods. This will be more efficient to represent huge amount of data. However a cluster based bidirectional agglomerative hierarchical will facilitate efficient computational cost. Using proposed algorithm have been seen an equal development on the information analysis techniques. This will support users on automatic and intelligent analysis of great volumes of data in order to find useful knowledge and satisfy the user's goals with taking into account the dilemma high computational cost of analyzing these massive data by using traditional data mining algorithms.

Bidirectional agglomerative hierarchical clustering algorithm, need to compute the distance of each cluster to all other clusters and at each step the number of clusters decreases by one. Considering of bidirectional agglomerative hierarchical clustering using single-link method if the number of objects are n, there are n/2 (in the best case) or n/2+1 in the worst case) levels. Each level involves finding a minimum from tree T with time complexity 0(1). Then merge two clusters into a single cluster it need 0(1) and finally, update the proximity tree, T, by deleting the nodes corresponding to clusters its need O(logn). Whilst, the traditional agglomerative clustering algorithm, if the number of objects are n, there are n-1 levels. Each level involves finding a Illlnimum from matrix D with time complexity 0(112) then merge two clusters into a single cluster it need 0(1) and finally, update the proximity matrix, D, by deleting the rows and columns corresponding to clusters its need O(n).

This research has also proposed a visual cluster approach to visualize the knowledge extracted by the data mining algorithm based on AVL tree. The tree data structures and representations are essential for many environments. They are able to show hierarchical organizations of data and concepts easily. The graphical representation of a visualized tree structure is chosen to help even an inexperienced user to navigate in the tree structure Without much additional knowledge about the used graphical

223

notation. This research included the human in the data exploration process through visualized the knowledge extracted by data mining algorithm to get knowledge from data, through integration of visualization techniques and data mining algorithms with the intuitive power of the human mind to deal with the flood of information, also to assist data miners in cluster analysis of large number of databases and datasets.

This research used AVL tree approach for two purposes. These two purposes are reduced the high computational cost by applying the proposed algorithm on the AVL tree and visualized the knowledge extracted by the proposed algorithm using AVL tree. Visualization based AVL tree approach provides tools to help the user to determine detailed about the cluster and categorization is appropriate for their data set.

In this approach two clusters are merging into one cluster. By involving the user directly in the data mining process and interacted the user with the process through exploitation of the power of human sight and brain for analyzing and exploring data.

It is then up to the user to determine whether or not this merging was useful. The data miners and humans can intuitively have a visual assessment, and also have a precise evaluation of the consistency of the cluster structure by performing geometrical computation on their data distributions. It also allows data miners and humans to get knowledge into the data, draw conclusions, detect interesting patterns, increase a deep visual understanding of datasets and directly interact with the data. It is well known that it is frequently much easier for an analyst to detect patterns in data visualizations than in numerical raw data.