THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

Tekspenuh

(1)M. al a. ya. DEEP PLANT: A DEEP LEARNING APPROACH FOR PLANT CLASSIFICATION. U. ni. ve. rs i. ty. of. LEE SUE HAN. FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR. 2018.

(2) M. al a. ya. DEEP PLANT: A DEEP LEARNING APPROACH FOR PLANT CLASSIFICATION. ty. of. LEE SUE HAN. U. ni. ve. rs i. THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY. FACULTY OF COMPUTER SCIENCE AND INFORMATION TECHNOLOGY UNIVERSITY OF MALAYA KUALA LUMPUR. 2018.

(3) UNIVERSITI MALAYA ORIGINAL LITERARY WORK DECLARATION. Name of Candidate: Lee Sue Han Registration/Matrix No.: WHA140012 Name of Degree: Doctor of Philosophy. ya. Title of Project Paper/Research Report/Dissertation/Thesis (“this Work”):. al a. Deep Plant: A Deep Learning Approach for Plant Classification Field of Study: Computer Vision (Computer Science). M. I do solemnly and sincerely declare that:. U. ni. ve. rs i. ty. of. (1) I am the sole author/writer of this Work; (2) This work is original; (3) Any use of any work in which copyright exists was done by way of fair dealing and for permitted purposes and any excerpt or extract from, or reference to or reproduction of any copyright work has been disclosed expressly and sufficiently and the title of the Work and its authorship have been acknowledged in this Work; (4) I do not have any actual knowledge nor do I ought reasonably to know that the making of this work constitutes an infringement of any copyright work; (5) I hereby assign all and every rights in the copyright to this Work to the University of Malaya (“UM”), who henceforth shall be owner of the copyright in this Work and that any reproduction or use in any form or by any means whatsoever is prohibited without the written consent of UM having been first had and obtained; (6) I am fully aware that if in the course of making this Work I have infringed any copyright whether intentionally or otherwise, I may be subject to legal action or any other action as may be determined by UM.. Candidate’s Signature. Date. Subscribed and solemnly declared before,. Witness’s Signature. Date. Name: Designation: ii.

(4) DEEP PLANT: A DEEP LEARNING APPROACH FOR PLANT CLASSIFICATION ABSTRACT. Plant classification systems developed by computer vision researchers have helped botanists to recognize and identify unknown plant species more rapidly. Hitherto, the majority of. ya. computer vision approaches have been focused on designing sophisticated algorithms to achieve a robust feature representation for plant data. For many morphological leaf. al a. features pre-defined by botanists, researchers use hand-engineering approaches for their. M. characterization. They look for the procedures or algorithms that maximize the use of leaf databases for plant predictive modelling, but this results in leaf features which are. of. liable to change with different leaf data and feature extraction techniques. As a solution, the first part of the thesis proposes a novel framework based on Deep Learning (DL) to. ty. solve the ambiguities of leaf features that are deemed important for species discrimina-. rs i. tion. The leaf features are first learned directly from the raw representations of input data. ve. using Convolutional Neural Networks (CNN), and then the chosen features are exploited based on a Deconvolutional Network (DN) approach. Besides using solely a single leaf. ni. organ to recognize plant species, numerous studies have employed DL methods to solve. U. multi-organ plant classification problem. They focus on generic feature as such the holistic representation of a plant image, disregarding its organ features. In such case, irrelevant features might be erroneously captured especially when they appear to be discriminative for species recognition. Therefore, the second part of the thesis proposes a new hybrid generic-organ CNN architecture. Specifically, it can go beyond the regular generic description of a plant, integrating the organ-specific features together with the generic features to explicitly force the designed network to focus on the organ regions during species classification. Modelling the relationship between different plant views (or organs) is iii.

(5) important as these images captured from a same plant share overlapping characteristics which are useful for species recognition. The existing CNN based approaches can only capture the similar region-wise patterns within an image but not the structural patterns of a plant composed of varying number of plant views images composed of one or more organs. The third part of the thesis proposes a novel framework of plant structural learning based on Recurrent Neural Networks (RNN), namely the Plant-StructNet. Specifically, it. ya. takes into consideration contextual dependencies between varying plant views capturing one or more organs of a plant and optimizes them for species classification. In summary,. al a. the collective impact of the above contributions have constituted to achieve a more prac-. M. tical and feasible framework towards the applications of plant identification. Empirical studies show that the proposed frameworks outperform the state-of-the-art (SOTA) meth-. of. ods in Flavia (S. G. Wu et al., 2007a) and PlantClef2015 plant dataset (Joly et al., 2015). These findings can serve as reference sources for the research community working on. rs i. ty. plant identification, and also help to support the future work in this area.. U. ni. ve. Keywords: Plant classification, deep learning. iv.

(6) DEEP PLANT: KAEDAH DEEP LEARNING UNTUK KLASIFIKASI TANAMAN ABSTRAK. Pengenalan tanaman berbantuan kumputer yang dicipta oleh penyelidik visi komputer telah membantu ahli botani mengenali dan mengenal pasti spesies tanaman yang tidak. ya. diketahui dengan lebih cepat. Sehingga kini, penyelidik visi komputer telah memberi tumpuan dalam menciptai algoritma yang canggih untuk mencapai representasi fitur tana-. al a. man yang baik. Bagi banyak morfologi yang telah ditentukan oleh ahli botani, penye-. M. lidik menggunakan teknik feature engineering untuk mencirikan tanaman. Mereka memberikan tumpuan dalam prosedur atau algoritma yang dapat memaksimumkan penggu-. of. naan pangkalan data daun untuk mengenali spesies tanaman. Namun, ini menghasilkan keputusan bahawa ciri-ciri daun yang digunakan berubah-ubah berdasarkan data daun. ty. dan teknik pengekstrakan ciri-ciri daun yang berlainan. Sebagai penyelesaian, bahagian. rs i. pertama tesis mencadangkan kerangka kerja yang baru berdasarkan Deep Learning un-. ve. tuk menyelesaikan kekaburan ciri-ciri daun yang dianggap penting untuk mengenali spesies tanaman. Pendekatan itu ialah pembelanjaran ciri-ciri daun terus dari pangkalan. ni. data daun melalui convolutional neural network, dan kemudian megeksploitasi ciri-ciri. U. daun yang dipilih oleh convolutional neural network melalui kaedah Deconvolutional. Network. Selain menggunakan hanya sejenis organ tanaman untuk mengenali spesies tanaman, banyak kajian telah menggunakan kaedah Deep Learning untuk menyelesaikan masalah klasifikasi pelbagai organ tanaman. Mereka memberi tumpuan kepada ciriciri generik seperti representasi holistik imej tanaman, mengabaikan ciri-ciri organnya. Dalam kes sedemikian, ciri-ciri yang tidak berkaitan mungkin dapat ditangkapkan secara salah, terutamanya apabila ia kelihatan discriminatif untuk pengenalan spesies. Oleh itu, bahagian kedua tesis mencadangkan satu arsitektur hibrid generik-organ convolutional v.

(7) neural network yang baru. Khususnya, ia boleh melampaui generik pengukuran biasa tanaman, mengintegrasikan ciri-ciri khusus organ bersama-sama dengan ciri-ciri generik, untuk secara eksplisit menegaskan network itu dalam memberi tumpuan kepada kawasan organ semasa klasifikasi spesies. Pemodelan hubungan antara imej tanaman yang ditangkap dari sudut (atau organ) berlainan adalah penting terutamanya kerana imej-imej yang ditangkap dari tanaman yang sama berkongsi ciri-ciri bertindih yang berguna untuk. ya. pengenalan spesies. Teknik convolutional neural network yang sedia ada hanya boleh beroperasi pada hanya satu imej tanaman sahaja tetapi bukan struktur tumbuhan yang. al a. terdiri daripada pelbagai imej tumbuhan megandungi satu atau lebih organnya. Bahagian. M. ketiga tesis mencadangkan kerangka kerja struktur tanaman yang baru berdasarkan recurrent neural network, iaitu Plant-StructNet. Khususnya, ia mempertimbangkan hubungan. of. kontekstual antara imej-imej tanaman yang ditangkap dari pelbagai sudut megandungi satu atau lebih organnya, dan mengoptimumkannya untuk klasifikasi species. Kesim-. ty. pulannya, kesan kolektif sumbangan di atas akan mencapai kerangka kerja yang lebih. rs i. pratikal dan boleh dilaksanakan terhadap aplikasi identifikasi spesies tanaman. Kajian. ve. empirikal menunjukkan bahawa rangka kerja yang dicadangkan mengatasi kaedah SOTA dalam Flavia (S. G. Wu et al., 2007a) dan PlantClef2015 dataset tanaman (Joly et al.,. ni. 2015). Penempuan ini boleh menjadi sumber rujukan bagi komuniti penyelidikan yang. U. berkerja pada identifikasi spesies tanaman, dan juga membantu untuk menyokong masa depan kajian dalam bidang ini.. Kata kunci: Klasifikasi tanaman, deep learning. vi.

(8) ACKNOWLEDGEMENTS. Foremost, I would like to express my sincere gratitude to my supervisors Dr. Chee Seng Chan for the continuous support of my PhD study and research, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis.. ya. I would like to thank my co-supervisor Prof. Paolo Remagnino for all of his guidance through this process; your discussion, ideas, and feedback have been absolutely invalu-. al a. able. My sincere thanks also goes to Paul Wilkin and Simon Joseph Mayo. This project. M. could not be completed without your specific expertise and generous support. I thank my fellow labmates Yang Loong Chang, Yuen Peng Loh, Ying Hua Tan,. of. Chee Keng Ch’ng and Jia Huei Tan for the stimulating discussions and ideas sharing, and also for all the fun we have had in the past times.. ty. An honorable mention goes to my families and friends for their understandings and. rs i. supports on me in completing this project. Your love, laughter and companions have kept. ve. me smiling and inspired.. Last but not least, I would like to thank all of those who supported me in any respect. U. ni. during the completion of the project.. vii.

(9) TABLE OF CONTENTS. ORIGINAL LITERARY WORK DECLARATION. ii. ABSTRACT. iii. ABSTRAK. v vii. TABLE OF CONTENTS. viii. ya. ACKNOWLEDGEMENTS. LIST OF FIGURES. al a. LIST OF TABLES LIST OF SYMBOLS AND ABBREVIATIONS. of. Computational Botany Challenges in Plant Identification Task Objectives Contributions Outline. ty. 1.1 1.2 1.3 1.4 1.5. M. CHAPTER 1: INTRODUCTION. x xiii xiv 1 2 4 8 9 12. CHAPTER 2: LITERATURE REVIEW. 15. 2.1 2.2. 15 21. rs i. Leaf Identification Multi-Organ Plant Identification. 27. 3.1 3.2. 27 28 28 33 34 40 41 41 42 50 51 52 54 56. ve. CHAPTER 3: AUTOMATED PLANT IDENTIFICATION. U. ni. Deep Learning Features Exploration 3.2.1 Convolutional Neural Networks 3.2.2 Deconvolutional Network 3.2.3 MalayaKew Dataset 3.2.4 Discussion Insights of CNN 3.3.1 Quantitative Analysis 3.3.2 Qualitative Analysis 3.3.3 Discussion Hybrid Global-Local Leaf Feature Extraction 3.4.1 Architecture 3.4.2 Experiments Summary. 3.3. 3.4. 3.5. viii.

(10) CHAPTER 4: HGO-CNN: HYBRID GENERIC-ORGAN CONVOLUTIONAL NEURAL NETWORK. 4.4. 4.5. ya. 4.3. Introduction Architecture 4.2.1 Multi-Scale Plant Images Generation 4.2.2 Feature Fusion Scheme Experiments on LifeClef2015 Plant Classification Challenge 4.3.1 Datasets and Evaluation Metrics 4.3.2 Performance Evaluation 4.3.3 Detailed Scores for Each Plant Organ 4.3.4 Qualitative Analysis 4.3.5 Failure Analysis 4.3.6 Model Improvement Experiments on LifeClef2017 Plant Classification Challenge 4.4.1 Datasets and Evaluation Metric 4.4.2 Performance Evaluation on Validation Set 4.4.3 Experimental Results on Test Set Summary. 5.4. of. ty. 5.3. Introduction Related Works of RNN 5.2.1 Architecture Experiments 5.3.1 Performance Evaluation 5.3.2 Ensemble Models Summary. rs i. 5.1 5.2. M. CHAPTER 5: THE PLANT-STRUCTNET. al a. 4.1 4.2. CHAPTER 6: CONCLUSIONS Summary Limitations Future Works. ni. ve. 6.1 6.2 6.3. U. REFERENCES. 58 58 61 62 63 65 65 67 67 68 73 73 77 77 78 79 82 83 83 84 85 88 89 94 95 99 99 101 102 104. LIST OF PUBLICATIONS AND PAPERS PRESENTED. 117. APPENDIX. 118. ix.

(11) LIST OF FIGURES. 5. 5. M. al a. ya. Figure 1.1: Samples of leaves from Flavia dataset (S. G. Wu et al., 2007b). In general, it is very hard to differentiate them as they are visually and semantically similar, e.g. shape. However, botanists could easily differentiate them using their venation patterns. Note that, (a) and (e) are from the ’big-fruited holly’, and (b), (c) and (d) are from the ’wintersweet’ species class. Figure 1.2: Examples of fruit organs with very similar appearance between species (right: Cornus mas L., left: Cornus sanguinea L.). However, by extending the observation to different views capturing one or more organs such as branches and leaves, the discriminative patterns can be easily found out. For examples, color and texture of the branches as well as the venation structure of the leaves. Figure 1.3: (a) and (b) represent examples of plant images taken from the plants tagged with ObservationID 14982 and 6840 respectively in PlantClef2015 dataset (Joly et al., 2015). Different plant views images of a same plant exhibit correlated characteristic in their organ structures.. U. ni. ve. rs i. ty. of. Figure 3.1: The proposed DL framework shown in a bottom-up and top-down way to study and understand plant identification. Best viewed in electronic form. Figure 3.2: The AlexNet CNN architecture used for plant identification. Best viewed in electronic form. Figure 3.3: Visualization of convolution process. Figure 3.4: Visualization of max pooling process. Figure 3.5: The feature maps are flatten into one dimensional array before feeding into fully connected layers for classification. Figure 3.6: Calculation of feedforward and backpropagation process. Figure 3.7: Visualisation V1 strategy to understand how and why the CNN works/fails. Best viewed in colour. Figure 3.8: Examples of the 44 species of leaves in MalayaKew (MK) Leaf Dataset. It consists of 626 numbers of leaf samples collected and scanned with a dedicated vacuum scanner invented by the engineers from the Royal Botanic Gardens, Kew. Figure 3.9: Failure analysis of the CNN model in D1. Best viewed in electronic form. Figure 3.10: Feature visualisation using V1. This shows that shape (feature) is chosen in D1, while venation and the divergence between different venation orders (feature) are chosen in D2. Best viewed in colour. Figure 3.11: Failure analysis on our proposed CNN model in D2.. 7. 29 29 31 31 32 32 35. 35 37. 38 40. x.

(12) 42. 43. 47. 48. 49. U. ni. ve. rs i. ty. of. M. al a. ya. Figure 3.12: Feature visualisation of layer 1. The upper row shows the top nine image patches from the training set that caused the highest activations for the selected channels. The lower row shows feature visualisation of the high activation in feature maps based on the deconvolution. Figure 3.13: The top four image patches from the training set that caused the highest activations in a random subset of channels in layers 2 to 4. Best viewed in electronic form. Figure 3.14: Each column (a), (b), (c) and (d) depicts deconvolution results of channels conv2151 , conv2139 , conv2173 , conv2202 to the validation set (val set), which consists of different species classes. (a), (b) and (c) Gradient changes along the leaf structures at different orientations can be seen. (d) The neurons are activated by the color of the leaf. Best viewed in electronic form. Figure 3.15: Each column (a), (b), (c) and (d) depicts the deconvolution results of channels conv34 , conv350 , conv3228 and conv3265 to the validation set (val set). (a) Some kind of wave edge structures are activated. (b) Outlines of the leaf are captured. (c) Curving structures of the leaf are observed. (d) Divergent (leaf vein) structures of the leaf are activated. Best viewed in electronic form. Figure 3.16: Each column (a), (b), (c) and (d) depicts the deconvolution results of channels conv4373 , conv4170 , conv4148 and conv4365 to the validation set (val set). (a) Venation-liked features are observed. (b) Conjunctions of curvature features in certain orientations are activated. (c) Features seem capturing the sharp corners, especially in the region of the leaf tips. (d) Features are extracted based on filters that respond to corner conjunctions in certain ranges of degree angles. Best viewed in electronic form. Figure 3.17: Each row (a), (b) and (c) depicts the deconvolution results of channels conv532 , conv5180 and conv5168 . Although both species have very similar leaf outline shapes, the filters (a) seem to be activated most strongly on the pinnately veined leaves, (b) leaves with serrulate margins and (c) lobes with long, narrow blades shown in validation set bounded by the blue outlines. Best viewed in electronic form. Figure 3.18: Different types of fusion strategies. (a) Late fusion (b) Early fusion (cascade) (c) Early fusion (conv-sum) Figure 4.1: Large variability in the appearance of plant organs. Even within the same organ, large differences can occur. Besides, images taken in the outdoor field, clutter in the background makes more difficult recognizing plant species.. 50 52. 59. xi.

(13) 62 69. rs i. ty. of. M. al a. ya. Figure 4.2: Overview of the HGO-CNN framework. (a) The architecture of the HGO-CNN; (b) Multi-scale plant images generation: given a plant image, the training images are isotropically rescaled into three different sizes: 256, 384 and 512. Then, for 384 and 512 image sizes, 256 × 256 center pixels are cropped; (c) The HGO-CNN feature fusion scheme: (i) during training, the two-path CNN is initially pretrained with the ImageNet dataset (Russakovsky et al., 2015). (ii) Then, one of the CNN path is repurposed for the organ task, while (iii) the another CNN path is repurposed for the generic task. (IV) Finally, new species layers are introduced to train the correlation between both the organ and generic components. Figure 4.3: Species of stem images:(a)Acer pseudoplatanus L.(b) and (d)Acer saccharinum L.(c) Aesculus hippocastanum L. Figure 4.4: Visualization of the last convolution of generic, organ and species layer for the test images. Color contrast is digitally enhanced. It is noticeable that features learned in both organ and generic layers extract complementary information for better modeling of a plant species. Figure is best viewed in electronic form. (cont.) Figure 4.4: Continued. Figure 4.5: Misclassified examples. The projected fc7 features of misclassified images (left) are found having almost similar feature patterns to the wrongly classified species classes (right). Figure 4.6: Samples of images from trusted training set. Figure 4.7: Samples of images from noisy training set. Figure 4.8: Results of the LifeClef2017 multi-organ plant classification task (Adopted from the website: http://www.imageclef.org/lifeclef/2017/plant).. U. ni. ve. Figure 5.1: The architecture of the proposed Plant-StructNet in classifying different plant views capturing one or more organs of a plant. Each state of the network stores the information of one plant view. Figure 5.2: The 3-stage cascaded attention module Figure 5.3: Percentage of images that fall under category A for each organ category (%). Figure 5.4: Feature embedding visualizations of the Plant-StructNet using t-SNE. (a) Image visualization. (b) Scatter plot: points with the same color and symbol are the features belonging to the same species class. Best viewed in electronic form. Figure 5.5: Feature embedding visualizations of the E-CNN using t-SNE. (a) Image visualization (b) Scatter plot: points with the same color and symbol are the features belonging to the same species class. Best viewed in electronic form.. 70 71. 72 76 76. 80. 86 91 92. 96. 97. xii.

(14) LIST OF TABLES. Table 2.1: Summary of related studies. Table 2.2: Summary of the organ-specific feature extraction approaches. Table 2.3: Summary of the generic feature extraction approaches. al a. ya. Table 3.1: Performance comparison on the MK leaf dataset with different classifiers. MLP = Multilayer Perceptron, SVM = Support Vector Machine, and RBF = Radial Basis Function. Table 3.2: Performance comparison on the Flavia leaf dataset. FD = Fourier descriptors, SDF = Shape defining features, RF = Random forest, NN = Nearest neighbors and ANN = Artificial neural network. Table 3.3: Top-1 classification accuracy results of our proposed models. Note that, LF = late fusion, EF = early fusion, W = whole leaf, P = patches. rs i. ty. of. M. Table 4.1: Performance comparison with other best plant identification systems evaluated in the LifeClef2015 challenge. Note that, M-S = Multi-scale. Table 4.2: Classification performance comparison of each contents based on Simg . Table 4.3: Evaluation of different improvement strategies for M-S HGO-CNN. Table 4.4: Classification performance comparison of each content based on Simg for the enhanced models. Table 4.5: Performance comparison. Table 4.6: Performance comparison for trusted training set (EOL). Table 4.7: Performance comparison for noisy training set (WEB). Table 4.8: Performance comparison for noisy training set (WEB+EOL).. U. ni. ve. Table 5.1: Performance comparison between the Plant-StructNet and the E-CNN. Note that, attn is the attention mechanism, ns is the number of stages of the attention module and Fsm is the forward states modeling. Table 5.2: Comparison of top-1 classification accuracy for different categories of observation ID. Note that, Category A = number of images < 2 per observation ID; Category B = number of images ≥ 2 per observation ID Table 5.3: Classification performance comparison of each content based on Simg . Table 5.4: Evaluation of the ensemble models Table 5.5: Performance comparison with SOTA based on Simg . Note that, values with (∗) are the results originally reported in (M. M. Ghazi et al., 2017). 18 24 25. 36. 42 55. 67 68 75 75 78 81 81 81. 91. 92 92 94. 95. xiii.

(15) LIST OF SYMBOLS AND ABBREVIATIONS. of. M. al a. ya. Bag of Words. Circular Covariance Histogram. Convolutional Neural Networks. Deep Learning. Deconvolutional Network. Edge Orientation Histogram. Fisher vector. Generative Adversarial Network. Gaussian Mixture Model. Gated Recurrent Unit. Hybrid generic organ convolutional neural network. Histogram of Oriented Gradients. k-nearest neighbour. Local Binary Patterns. Long Short Term Memory. Recurrent Neural Networks. Shape Context. Scale-invariant feature transform. state-of-the-art. Speeded-Up Robust Features. Support Vector Machine. Weighted Probability.. U. ni. ve. rs i. ty. BOW CCH CNN DL DN EOH FV GANs GMM GRU HGO-CNN HOG KNN LBP LSTM RNN SC SIFT SOTA SURF SVM WP. xiv.

(16) CHAPTER 1: INTRODUCTION. Plants are the backbone of all life on earth providing us with food and oxygen. A good understanding of plants is essential to help in identifying new or rare plant species in order to improve the drug industry, balance the ecosystem as well as the agricultural productivity and sustainability (Cope et al., 2012). With a growing human population and a chang-. ya. ing climate, there is an increasing threat to many ecosystems, and in fact, biodiversity is declining steadily throughout the world mainly due to direct or indirect human activities.. al a. It is therefore becoming increasingly important for people to build an accurate species. M. knowledge to recognize unknown plant species and to explore the geographic distribution of plants for future biodiversity conservation (Wäldchen & Mäder, 2017).. of. The traditional approach involves manual identification process, that is to train taxonomists who can examine specimens and assign taxonomic labels to them. However, a. ty. problem known as the “taxonomic crisis” exists, which is caused by an increasing short-. rs i. age of skilled taxonomists nonexistent taxonomic knowledge within the general public. ve. (Cope et al., 2012). Categorization of plants hence remain a tedious task due to the limitations of knowledge and information of world’s plant families. This situation is further. ni. exacerbated by the difficulties of traditional plant species identification that requires a. U. substantial botanical expertise, which puts it beyonds the reach of most of the nature enthusiasts, for example, the general public and the professionals that deal with botanical problems daily such as conservationists, farmers, foresters, and landscape architects. For this reason, taxonomists have started to seek methods that can meet species identification requirements, such as developing digital image processing and pattern recognition techniques (Wäldchen & Mäder, 2017).. 1.

(17) 1.1. Computational Botany. Computational botany consists of applying innovative computational methods to help progress on an age-old problem, i.e. the identification of the estimated 400,000 species of plants on Earth (Govaerts, 2001). This interdisciplinary approach combines botanical data and species concepts with computational solutions for classification of plants or parts thereof and focuses on the design of novel recognition methods. These are mod-. ya. eled using botanical data, but are extendable to other large repositories and application. al a. domains. Plant species identification is a subject of great importance in many fields of human endeavour, including such areas as agronomy, conservation, environmental impact,. M. natural product and drug discovery and other applied areas. Remote survey of the Earth by real-time satellite monitoring provides huge quantities of data on natural and man-made. of. vegetation types, offering the possibility for fine scale and remote plant identification us-. ty. ing automated pattern analysis systems (Nagendra & Rocchini, 2008). Robot technology. rs i. is driving studies on automatic plant identification in agronomic research aimed at crop improvement by recognition of crop plants and elimination of weeds (L. Qi et al., 2009).. ve. Despite these obviously important possibilities, automatic plant species recognition - a. ni. foundational capability in this context - is nevertheless still in its early stages.. U. Recent progress in computer vision makes it possible to assist botanists in plant iden-. tification tasks. Through image analysis based on computational tools, raw image data are converted to a suitable internal representation so called the feature vectors from which classifiers or machine learning mechanisms could be used to recognize and classify the patterns of the input. In recent years, build-in digital cameras of a mobile device have become ubiquitous, increasing interest in creating hand-held field guides. With this technology, users could receive instant information about pictures of plants taken through an installed recognition application, such as likely a list of possible species (Joly et al., 2014;. 2.

(18) N. Kumar et al., 2012). Hence, it is undeniable that computer-aided plant identification system, not only able to assist botanists but also benefit the non-professionals in plant identification tasks. Plants are complex living organisms sustained by a number of organ systems. A number of approaches have been proposed in the literature for automatic analysis of botanical organs, such as leaves and flowers (Joly et al., 2015; S. G. Wu et al., 2007a;. ya. Zhang et al., 2013). In botany, leaves are generally used to supply important diagnostic characters for plant classification and in some groups exclusively so. Since the early days. al a. of botanical science, plant identification has been carried out with traditional text-based. M. taxonomic keys that use leaf characters, among others. For this reason, researchers in computer vision have used leaves as a comparative tool to classify plants (Hall et al.,. of. 2015; Kadir et al., 2013; Kalyoncu & Toygar, 2015; N. Kumar et al., 2012). Characters such as shape (Mouine et al., 2012; Neto et al., 2006; Xiao et al., 2010), texture (Cope. ty. et al., 2010b; Naresh & Nagendraswamy, 2016; Tang et al., 2015) and venation (Charters. rs i. et al., 2014; Larese et al., 2014) are the features most generally used to distinguish the. ve. leaves of different species. In contrast to studies on leaves, a smaller number of studies identify species solely based on flower. Most of them focus on the flower region as a. ni. whole (Apriyanti et al., 2013; Cho, 2012; Cho & Lim, 2006; Hong & Choi, 2012; Hsu. U. et al., 2011; Huang et al., 2009; Nilsback & Zisserman, 2006, 2008; Phyu et al., 2012; W. Qi et al., 2012; Zawbaa et al., 2014) while some of them analyze parts of the flower such as its petals (Nilsback & Zisserman, 2006; Tan et al., 2012) and pistils (Hsu et al., 2011), using characters such as the shape, texture and color. Ever since one of the foremost visual image retrieval campaigns, ImageCLEF has been hosting a plant identification challenge since 2011, researchers start to focus on automatic analysis of multiple images exploiting different views of a plant capturing one or more organs. Contrary to other approaches that analyze a single organ captured in one im3.

(19) age, multi-organ classification approaches (Dimitrovski et al., 2014; Goëau et al., 2014; Paczolay et al., 2014; Szűcs et al., 2014; Yanikoglu et al., 2014) have been proposed to address the challenge. Since year 2014, PlantCLEF has provided up to seven different plant views which are full plant, branches, flower, leaf, leaf scan, fruit, and stem. Researchers generally adopt organ-specific features for discrimination. They first group the plant images into their respective organ categories. Then, based on each organ category,. ya. organ-specific features are extracted using feature engineering approaches such as Scaleinvariant feature transform (SIFT), Bag of Words (BOW), Speeded-Up Robust Features. al a. (SURF), Gabor and Local Binary Patterns (LBP). During species classification stage, the. M. computed features in each organ category are trained individually using conventional machine learning algorithms such as Support Vector Machine (SVM), k-means clustering,. Challenges in Plant Identification Task. ty. 1.2. of. Weighted Probability (WP) approach, nearest neighbour classifier and random forest.. rs i. Although morphometrics and image processing are well-established and broad disci-. ve. plines, botanical morphometrics present some specific challenges for computer vision researchers. One of the main challenges is the variation in inter and intra-specific plant. ni. traits, increasing the difficulties in elucidating patterns of plant species for plant commu-. U. nity ecology. In such case, designing a plant identification system is inevitably strongly dependents on the ability of experts to encode domain knowledge. Nevertheless, this kind of information is usually partially available or incomplete for non-specialist users. For example as shown in Fig. 1.1, samples of leaves collected from the ‘big-fruited holly’ and ‘wintersweet’ species are very similar, especially in their outline shape. This shows that although shape is the commonly used feature, for certain species, it may not be feasible and appropriate for discrimination. The history of plant identification methods, however shows that existing plant identification solutions are mostly hand-crafted. For many 4.

(20) ya. ve. rs i. ty. of. M. al a. Figure 1.1: Samples of leaves from Flavia dataset (S. G. Wu et al., 2007b). In general, it is very hard to differentiate them as they are visually and semantically similar, e.g. shape. However, botanists could easily differentiate them using their venation patterns. Note that, (a) and (e) are from the ’big-fruited holly’, and (b), (c) and (d) are from the ’wintersweet’ species class.. U. ni. Figure 1.2: Examples of fruit organs with very similar appearance between species (right: Cornus mas L., left: Cornus sanguinea L.). However, by extending the observation to different views capturing one or more organs such as branches and leaves, the discriminative patterns can be easily found out. For examples, color and texture of the branches as well as the venation structure of the leaves. morphological leaf features pre-defined by botanists, researchers use hand-engineering approaches for their characterization (Cope et al., 2010b; N. Kumar et al., 2012; Naresh & Nagendraswamy, 2016). They look for procedures or algorithms that can get the most out of the leaf data for predictive modeling. Then, based on their performance, they justify the subset of features that are most important to describe the leaf data. However, these features are liable to change with different plant data or feature extraction techniques. 5.

(21) Using this kind of plant identification method is impractical especially dealing with a huge numbers of species classes. In such case, automated feature learning has become indispensable in this field of study. Therefore, the first part of the thesis is to look for an automated features learning technique to replace the necessity of human hand-design. Besides that, due to the intra or interspecies diversity of plants in nature, some species are difficult or impossible to differentiate from one another using only a single. ya. plant organ. Hence, people normally extend their observation to multiple organs to find out the discriminative patterns for species identification. For example as shown at the top. al a. of Fig. 1.2, the images of fruits are visually similar. Using solely a single image of a fruit. M. organ makes it considerably hard to differentiate between species. However, if one extend their observation to multiple organs such as branches and leaves (as shown at the bottom. of. of Fig. 1.2), together with fruits, they can easily find out the discriminative patterns of the plants, as a significant cue for plant recognition. For example, the differences between. ty. the appearance of branches as well as the venations of leaves. On the other hand, there. rs i. are times when certain organs are not in season, for example, during winter one can only. ve. observe the bark of a deciduous plant. So, it is known to be more informative to recognise multi-organ plant images. However, it is a challenging task to recognise different organs. ni. plant images owing to the large variability in the appearance of plant organs. Ventur-. U. ing into DL approach, the second part of the thesis focuses on designing an automated multi-organ plant classification system to ease the plant identification task. Apart from that, the relationship between different plant views (or organs) captured of a plant is worth to be considered for plant identification. It is noticeable that images captured from different plant views of a plant contain structural information that is not mutually exclusive. In fact, they share overlapping characteristics which are useful for species recognition. For example, as shown in Fig. 1.3(a) or (b), it can be seen that plant views images of a same plant exhibit correlated or overlapping characteristics in their 6.

(22) al a. ya U. ni. ve. rs i. ty. of. M. (a). (b). Figure 1.3: (a) and (b) represent examples of plant images taken from the plants tagged with ObservationID 14982 and 6840 respectively in PlantClef2015 dataset (Joly et al., 2015). Different plant views images of a same plant exhibit correlated characteristic in their organ structures. organ structures, but comparing between Fig. 1.3(a) and Fig. 1.3(b), which are captured from different plant entities, both characteristics are obviously very different. Henceforth, the third part of the thesis is to explore a new plant classification framework that takes. 7.

(23) into account of contextual dependencies between varying plant views capturing one or more organs of a plant.. 1.3. Objectives. Sec. 1.2 presents the challenges faced in plant identification task, which undeviatingly shows the importance of automated feature learning in this field. Based on these chal-. ya. lenges, three objectives are set to improve the efficiency of plant identification system. Traditional approaches that rely on the hand-crafted features is not practical to be. al a. used in the plant classification task mainly due to dataset dependency issue. To manually. M. design appropriate feature representations for every plant dataset is not an easy task especially when there are large variability in the appearances of plant organs as well as the. of. variation in inter and intra-specific plant traits. To address these challenges, an automated feature learning approach is explored to decide the best features for elucidating patterns of. ty. plant species, overcoming the inadequacy of traditional hand-crafted approaches. Hence,. rs i. the first objective in this thesis is to develop a better reasoning to model plant species. ve. based on a feature learning approach, called DL. A CNN based model, as one of the most used DL methods is designed to exploit the most discriminative leaf features in order to. ni. fit for the general purpose of species identification, and, a DN is employed to explore the. U. prominent features learned. Besides, there are certain circumstances that botanists need to extend their observa-. tion to multiple organs when a single organ could not provide sufficient information to elucidate patterns of plant species. However, it is a challenging task to recognise different organs plant images owing to the large variability in the appearance of plant organs. Hence, to address this challenge, the second objective is to propose a new multi-organ plant classification system based on DL approach. The architecture designed can go beyond the regular generic description of a plant, integrating the organ-specific features 8.

(24) together with the generic features to explicitly force the network to focus on the organ regions during species classification. Apart from that, images captured from a plant are interrelated as they belong to the same plant specimen. In fact, they can be seen as different views or structures of a plant. In such case, the relationship between these plant images is important and worth to be taken into consideration. Hence, the third objective is to propose a new plant classification. ya. framework based on RNN. Instead of using RNN based method to process pixels (Ren & Zemel, 2017; Romera-Paredes & Torr, 2016) or regions (Ba et al., 2015; Gregor et al.,. al a. 2015) level information of an image, the RNN based method is formulated to process the. M. structural level information of a plant based on several images captured from its various. 1.4. of. organs or different viewpoints of a similar organ.. Contributions. ty. The main aim of this thesis is to propose a better solution to classify plant species based. rs i. on DL approach. Moreover, to gain understanding on how DL improves the performance. ve. of species identification, studies are done to explore the insights of DL, finding out the characteristics of the features learned. To draw the readers’ attention on the significance. ni. of deep learning approach for plant classification, the abbreviation ’Deep Plant’ is used as. U. a symptomatic of two important domains: Deep learning and Plant classification. Technically in this thesis, three main contributions are presented:. Contribution 1: This thesis introduces an automated plant identification system based on DL. It explores a novel DL approach to learn discriminative features for leaf data. Although considerable techniques have been introduced in previous publications (Cope et al., 2010b; N. Kumar et al., 2012; Naresh & Nagendraswamy, 2016), ambiguity surrounding the features that best represents the leaf data still remain unsolved. For this 9.

(25) reason, a way to quantify the features necessary to represent leaf data is defined. Firstly, an AlexNet (Krizhevsky et al., 2012) CNN model is trained using raw leaf data, then a DN approach is used to find how the CNN model characterizes the leaf data. In this work, the potential of the CNN in leaf identification is remarkably explored showing its superiority against the ordinary hand-engineered approach. Based on the qualitative and quantitative analysis, unexpected results are reported:. ya. (1) different orders of venation are the best representative features compared to those of outline shape, and (2) multi-level representation is observed in leaf data, demonstrating. al a. the hierarchical transformation of features from lower-level to higher-level abstraction,. M. corresponding to species classes. These findings are reported fit with the hierarchical botanical definitions of leaf characters. Apart from that, according to the outcomes of. of. DN, the CNNs trained on whole leaves and leaf patches are found exhibit different contextual information of leaf features. These features learned are analogous to the global. ty. features that describe the whole leaf structure and local features that focus on venation.. rs i. Based on these findings, new hybrid global-local feature extraction models are designed,. ve. and they are found can further improve the discriminative power of leaf identification. The proposed works were respectively accepted in Pattern Recognition (2017) and the. U. ni. IEEE International Conference on Image Processing (ICIP 2015), Quebec, Canada.. Contribution 2: To recognize plant species, botanists usually observe multiple plant structures captured from a same plant to encounter the feature ambiguities between species brought by the intra and interspecies diversity of plants in nature. For example as shown in Fig. 1.2, incorporating observation from multiple plant organs such as branches, leaves and fruits provides a better understanding on the discriminative patterns to distinguish plant species. There are also times people extend their observation to multiple views of a similar organ when other types of organs are not in season. Therefore, in connec10.

(26) tion with the aforementioned factors, the second contribution focuses on introducing a new automated multi-organ plant identification system, namely the Hybrid generic organ convolutional neural network (HGO-CNN). Despite promising solutions built using DL enable representative features to be learned for multi-organ plant images, the existing approaches focus mainly on generic features (Champ et al., 2015; Ge et al., 2015) for species classification, disregarding the features. ya. representing the organs. However, plants are complex living organisms sustained by a number of organ systems. Hence, in this work, the HGO-CNN is designed to take into. al a. account both organ and generic information, combining them using a new feature fusion. M. scheme for species classification. The empirical results show that the HGO-CNN outperforms all the best plant identification systems evaluated in the LifeClef2015 challenge. of. (Champ et al., 2015; Ge et al., 2015). The proposed work was accepted in the IEEE International Conference on Image Processing (ICIP 2017), Beijing, China.. ty. Contribution 3: Although the existing CNN based methods can extract the discriminative. rs i. features of a plant image without the needs of hand-engineering features, it has been. ve. designed to operate on a single plant image. This in turns is incapable of modeling the contextual dependencies between varying plant views (or organs) captured of a plant. By. ni. this reason, this thesis moves beyond existing practice, proposing a new framework of. U. plant structural learning based on RNN, namely the Plant-StructNet. The Plant-StructNet can model high level contextual dependencies between plant. views comprising different organs or different viewpoints of a similar organ. This approach as such supports classification based on a varying number of plant views images. Finally, based on the quantitative results, it is observed that the Plant-StructNet performs better compared to the HGO-CNN. Through t-SNE (Maaten & Hinton, 2008) feature visualisation, Plant-StructNet features which are found semantically separable compared to HGO-CNN based model reflects the quantitative results obtained. The proposed work 11.

(27) was submitted to IEEE Transactions on Image Processing.. 1.5. Outline. This chapter provides an overview of the works presented in this thesis. It emphasizes on the problem statements, objectives and the contributions. Following is the brief introduc-. ya. tion presenting the rest of the chapters.. Chapter 2: This chapter provides a critical and comprehensive review of existing meth-. al a. ods and a description of the context of plant identification - i.e. how species are delimited. M. by botanists using morphology. It begins with the background studies of plant identification based on plant leaves, and then proceeds to revisiting the approaches used to. of. classify multi-organ plant images. It provides thorough studies on the variety of features exploited, and also the feature extraction techniques. Based on literature review, it. ty. highlights parallel streams of research and motivates greater efforts to solve a range of. ve. rs i. important practical problems.. Chapter 3: This chapter begins with an introduction to DL, and then the idea of DL for. ni. automatic processing and classification of big data. Next, it describes the architecture of. U. AlexNet CNN with its underlying computational theory in preceding, and then introduces the methodology that fits for the specific purpose of leaf identification. After introducing the CNN, a DN approach is described to explore the representative features learned in CNN. It also shows how the underlying computation can be modified to provide another viewpoint of feature visualization. The first experiment shows that venation structure is a very important feature for identification especially when shape feature alone is inadequate. This is verified by checking the global response of the filters in each convolution layer using the proposed V1 12.

(28) feature visualization strategy. It furthermore examines the local response of individual filters in each convolution layer and shows a hierarchical transformation of features from low-level to high-level abstraction throughout the convolution layer. These findings are proven fit the hierarchical botanical definitions of leaf characters. Finally, it introduces novel hybrid models, exploiting the correspondence of different contextual information of leaf features. The experiment results show that the hybrid local-global features learned. ya. using DL can improve recognition performance compared to previous techniques.. al a. Chapter 4: Contrary to previous chapter that investigates the use of DL to harvest dis-. M. criminatory features from solely leaf organ images, and use them to classify plant species, this chapter focuses on identification based on multi-organ plant images. These plant or-. of. gans include entire plants, branches, flowers, leaves, fruits and stems. It proposes a novel architecture, namely the HGO-CNN to classify multi-organ plant images based on the cor-. ty. relation between the chosen organ and generic-based features. In order to train the HGO-. rs i. CNN to capture prior organ information, and, subsequently integrate both generic and. ve. organ-based information for species classification, it introduces a feature fusion scheme that based on a novel step-by-step training strategy.. ni. To increase the robustness of the model in recognising multi-organ plant images, it. U. describes the augmentation of dataset for training and testing. Apart from that, it also presents various improvements made to enhance the discriminative ability of the HGOCNN. Besides quantitatively analyzing the HGO-CNN, it goes deeper into exploring the learned features in organ, generic and species layers using the DN approach. It reports that both the organ and generic features learned in HGO-CNN exhibit different contextual information of a plant image, and the correspondence of both components is shown to be able to drive the network to better characterize a plant species.. 13.

(29) Chapter 5: This chapter extends the work of the previous chapter, working on multiorgan plant identification problem. Nonetheless, it focuses on the automatic analysis of multiple images based on RNN, exploiting different views of a same plant capturing one or more organs. Inspired by the success of RNN in modeling long-term dependency, this chapter first revisits RNN and its varying application. It shows that the current work builds on the foundations laid in several of these existing RNN based approaches.. ya. Next, it proposes a new framework of plant structural learning based on RNN, namely the Plant-StructNet. In contrast to the HGO-CNN, this new system takes in a vary-. al a. ing number of plant views images composed of one or more organs, and, offers an extra. M. flexibility in learning the relationship between them for species classification. The architecture details are explained and comprehensive experiments are presented subsequently. of. to evaluate the performance of the Plant-StructNet over the HGO-CNN. Last but not least, it exploits the features learned in the both HGO-CNN and Plant-StructNet by means of. ty. feature visualisation, and shows that the outcomes actually reflect the quantitative results.. ve. rs i. With these findings, it further confirms the effectiveness of the Plant-StructNet.. U. ni. Chapter 6: It concludes the works and provides suggestions for future work.. 14.

(30) CHAPTER 2: LITERATURE REVIEW. This chapter provides a comparative and comprehensive review of existing methods and a description of the context of plant identification. It begins with the background studies of plant identification based on leaf organ, and then proceeds to revisit the approaches used to classify multi-organ plant images. It provides thorough studies on the variety of. Leaf Identification. al a. 2.1. ya. features exploited and also the feature extraction techniques.. M. In computer vision, several researchers have exploited pattern recognition and image processing techniques to develop plant recognition systems based on plant leaves. Leaf or-. of. gan, of all times, has been known to be one of the mostly used plant organs for automatic species identification (Wäldchen & Mäder, 2017). Researchers have focused on a small. ty. subset of the leaf characters employed by botanists themselves, such as shape, texture and. rs i. color features. In general, shape and texture features are extensively used in plant iden-. ve. tification because their characteristics are more consistent throughout the seasons. Color features are predominantly employed together with shape or texture in order to build a. ni. more discriminative feature for species recognition. Apart from that, there have been a. U. few studies employing venation structure to classify species. Below shows various feature extraction methods that have been proposed to classify species based on different kinds of features. Shape. Most studies use shape recognition techniques to model and represent the. contour shape of the leaf. In one of the earliest papers Neto et al. (2006) introduced Elliptic Fourier and discriminant analyses to distinguish different plant species based on their leaf shape. Chaki & Parekh (2011) proposed two shape modeling approaches based on the invariant-moments and centroid-radii models. Du et al. (2007) proposed combining 15.

(31) geometrical (aspect ratio, rectangularity, area ratio and perimeter ratio of convex hull, sphericity, circularity, eccentricity and form factor) and invariant moments features to extract morphological structures of leaves. Ever since the techniques of Shape Context (SC) and Histogram of Oriented Gradients (HOG) were proposed and tested for their effectiveness in many computer vision tasks, attempts to create a leaf shape descriptor (Mouine et al., 2012; Xiao et al., 2010) have been made using them. Xiao et al. (2010) proposed. ya. HOG for building leaf image descriptors and Maximum Margin Criterion (MMC) to reduce dimensionality. Mouine et al. (2012) proposed to use the SC to model local as well. al a. as spatial relation information of selected salient points of the leaf. Recently, Aakif &. M. Khan (2015) proposed using different shape-based features such as morphological features, Fourier descriptors and a newly designed Shape-Defining Feature (SDF). Although. of. the algorithm showed its effectiveness in baseline dataset like Flavia (S. G. Wu et al., 2007a), the SDF is highly dependable on the segmented result of leaf images. Hall et. ty. al. (2015) proposed using Hand-Crafted Shape (HCS) and Histogram of Curvature over. rs i. Scale (HoCS) (N. Kumar et al., 2012) to analyse leaves. Zhao et al. (2015) proposed a. ve. new counting-based shape descriptor, namely independent-IDSC(I-IDSC) feature to recognize simple leaf and compound leaf. Apart from studying the whole shape contour. ni. of the leaf, some studies (Cope & Remagnino, 2012; Kalyoncu & Toygar, 2015) anal-. U. ysed leaf margins for species classification. For example: Kalyoncu & Toygar (2015) introduced a novel margin descriptor while Cope & Remagnino (2012) proposed using Dynamic Time Wrapping (DTW) technique to extract margin features. There are also some groups of researchers who are incorporating plant identification into mobile computing technology such as Leafsnap (N. Kumar et al., 2012) and Apleafis (Ma et al., 2013). Leafsnap identified plants based on curvature-based shape features of the leaf by utilizing integral measures to compute functions of the curvature at the boundary. The identification is then carried out by nearest neighbours (NN). Apleafis 16.

(32) identified leaves using color and shape-based features. They concluded that shape features are more suitable because color changes with the seasons while the shape remains unchanged. They employed both wavelets and the Pyramid Histogram of Oriented Gradient (PHOG) algorithms to extract shape-based features. Texture. Texture is another major field of study in plant identification as it is considered to be one of the most important visual attributes of plant images. It is used to describe. ya. the surface of the leaf based on the pixel distribution over a region. One of the earliest studies Backes & Bruno (2009) applied multi-scale fractal dimension to plant classifi-. al a. cation. Basically, they performed texture characterization based on complexity analysis.. M. Next, Cope et al. (2010b) proposed using Gabor co-occurrences in plant texture classification. They analyzed texture based on joint distributions from Gabor filters of different. of. scales. Rashad et al. (2011) employed a combined classifier – Learning Vector Quantization (LVQ) together with the Radial Basis Function (RBF) – to classify and recognize. ty. plants based on textural features. Olsen et al. (2015) proposed using rotation and scale. rs i. invariant HOG feature set to represent regions of texture within leaf images. Techniques. ve. based on LBP have drawn the attention of researchers due to its computational simplicity as well as outstanding texture analysis performance. Several researchers have proposed. ni. using LBP-based approaches to extract texture from leaves for classification. Naresh &. U. Nagendraswamy (2016) modified the conventional LBP approach to consider the structural relationship between neighboring pixels, replacing the hard threshold approach of basic LBP. (Tang et al., 2015) introduced a new texture extraction method based on the combination of GLCM and LBP to classify tea leaves. Venation. Identification of leaf species from their venation structure is much-sed by botanists. In computer vision, Charters et al. (2014) designed a novel descriptor called EAGLE. It comprises five sample patches that are arranged to capture and extract the spatial relationships between local areas of venation. They showed that a combination 17.

(33) of EAGLE and SURF was able to boost the discriminative ability of feature representation. (Larese et al., 2014) recognised legume varieties based on leaf venation. They first segmented the vein pattern using Hit or Miss Transform (UHMT), then used LEAF GUI measures to extract a set of features for veins and areoles. The latest study Grinblat et al. (2016) attempted deep learning in plant identification using vein morphological patterns. They first extracted the vein patterns using UHMT, and then trained a CNN to recognise. ya. them using a central patch of leaf images. CNN was used as for feature extraction as well as a classifier, replacing previous work (Larese et al., 2014) that employed LEAF GUI. al a. and Random Forests (RF).. M. A considerable amount of research has investigated using combinations of features to represent leaves. For example: Beghin et al. (2010) and Chaki et al. (2015) introduced. of. a combination of shape and texture features to identify a plant while Kadir et al. (2013). in Table 2.1.. ty. proposed using additional color features. A summary of our literature review is provided. rs i. Table 2.1: Summary of related studies.. ve. Publications. 2006. U. ni. Neto et al. (2006). Year. Du et al. (2007). Method. Elliptic Fourier +. Features. Shape. Texture Color. Venation. X. -. -. -. X. -. -. -. -. X. -. -. -. X. -. -. X. -. -. -. Discriminant analyses 2007. Geometrical calculation + Moment invariants. Backes & Bruno. 2009. (2009) Cope et al. (2010b). Multi-scale fractal dimension. 2010. Gabor Co-Occurrences. Xiao et al. (2010). 2010. HOG + MMC. 18.

(34) Table 2.1 Continued. Publications. Year. Beghin et al. (2010). 2010. Method. Features. Contour signature +. Shape. Texture Color. Venation. X. X. -. -. X. -. -. -. X. -. -. Sobel Chaki & Parekh. 2011. Centroid-radii model. Rashad et al. (2011). 2011. LVQ + RBF. -. Mouine et al. (2012). 2012. Advanced SC +. X. Edge Oriented. Cope & Remagnino. 2012. M. Histogram. DTW (leaf margin). N. Kumar et al.. 2012. 2013. Wavelet + PHOG. rs i. Ma et al. (2013). HoCS. ty. (2012). of. (2012). Kadir et al. (2013). 2013. X. -. -. al a. Hough, Fourier and. ya. (2011). Moment invariants +. Geometrical. X. -. -. -. X. -. -. -. X. -. -. -. X. X. X. -. ve. calculation + Polar Fourier Transform. ni. (Shape) + Color. U. moments (Color) + Fractal measure lacunarity (Texture). Charters et al. (2014). 2014. EAGLE. -. -. -. X. Larese et al. (2014). 2014. UHTM + LEAF GUI. -. -. -. X. Aakif & Khan. 2015. Geometrical. X. -. -. -. (2015). calculation + Fourier descriptors + SDF. 19.

(35) Table 2.1 Continued. Publications. Year. Kalyoncu & Toygar. 2015. (2015). Method. Features. Margin descriptors +. Shape. Texture Color. Venation. X. -. -. -. -. -. -. Moment Invariants + Geometrical. 2015. HCS + HoCS. X. Zhao et al. (2015). 2015. I-IDSC. X. Tang et al. (2015). 2015. LBP + GLCM. -. Olsen et al. (2015). 2015. HOG. Chaki et al. (2015). 2015. Gabor filter + GLCM. -. -. -. X. -. -. al a. Hall et al. (2015). ya. calculation. X. -. -. X. X. -. -. -. X. -. -. -. -. -. X. M. -. Naresh &. 2016. Modified LBP. ty. Nagendraswamy. of. + curvelet transform. rs i. (2016). 2016. UHTM + CNN. ve. Grinblat et al. (2016). Table. 2.1, lists most of the studies since year 2006 which focus on the use of shape. ni. feature to identify leaf species. According to Table. 2.1, shape features have been cho-. U. sen and tested in almost 62.5% of plant identification studies, much exceeding the use of other features. This is because they are the easiest and most obvious features for distinguishing species, particularly for non-professionals who have limited knowledge of plant characters. Nevertheless, quite a number of publications used texture features as well (approximately 41.7% according to Table. 2.1) given that some species are difficult or impossible to differentiate from one another using only shape features because of their similar leaf contours. Other than that, a few studies attempted to combine different features and tested their approach with different leaf databases in order increase their feature 20.

(36) representation. Although those techniques were shown to be successful, the performance of these aforementioned solutions is highly dependent on a chosen set of hand-engineered features. These hand-crafted features are liable to change with different leaf data and feature extraction techniques, which confounds the search for an effective subset of features to represent leaf samples in species recognition studies. With this background, this thesis provides a solution for the quantification of leaf features. It shows that by using DL. ya. approach, leaf features can be directly learned from raw representation of leaf input data, obviating the need for designing hand-crafted features. Through visualisation techniques,. Multi-Organ Plant Identification. M. 2.2. al a. one can gain insights of DL, exploring the features learned.. of. ImageCLEF, one of the foremost visual image retrieval campaigns, has been hosting a plant identification challenge since 2011. Since then, researchers have started to focus. ty. on automatic analysis of multiple images exploiting different views of a plant capturing. rs i. one or more organs. From year 2014, PlantCLEF has provided up to seven different plant. ve. views which are entire plant, branches, flower, leaf, leaf scan, fruit, and stem. Contrary to other approaches (Charters et al., 2014; N. Kumar et al., 2012; Larese et al., 2014) that. ni. analyze a single plant organ captured in one image, multi-organ classification approach. U. (Dimitrovski et al., 2014; Goëau et al., 2014; Paczolay et al., 2014; Yanikoglu et al., 2014) has been increasingly explored to address this new challenge. Researchers generally adopt organ-specific features for discrimination. Organ-specific features in this context are the representation which is specifically designed for a particular plant organ. Specifically, they first group the images of plants into their respective organ categories. Then, based on each organ category, organ-specific features are extracted using feature engineering approaches. These organ-specific features can be the same employed in several types of plant organs or different for each type of plant organ. 21.

(37) Dimitrovski et al. (2014) introduced using triangle side lengths and angle (TSLA) descriptor (Mouine et al., 2013) for the leafscan category and the opponent SIFT descriptor for the rest of the plant views. An approximate k-means (AKM) algorithm was employed next for clustering these descriptors and producing a large number of visual word. Goëau et al. (2014) used a large scale matching approach (SURF, Fourier2D, rotation invariant LBP, Edge Orientation Histogram (EOH), weighted RGB (w-RGB), weighted-LUV (w-. ya. LUV) and HSV histograms) for flowers, leaves, fruits, stems, branches and entire views of plant. However, for the leafscan category, an additional shape descriptors and geo-. al a. metric parameters on the shape boundary were used. Yanikoglu et al. (2014) used shape. M. and texture features such as Circular Covariance Histogram (CCH), Rotation Invariant Point Triplets (RIT), Orientation Histogram (OH), Color Auto-correlogram and etc. for. of. the leafscan category. For the stem category, they used the same global descriptors on texture and color used for the LeafScan category (CCH, OH, RIT), with an additional. ty. morphological covariance descriptor. For the flower, fruit and entire categories, they used. rs i. a BOW approach. In each system, they used SVM classifiers for predicting a list of. ve. ranked species. On the other hand, for the specific categories of branch and leaf, they used a CNN approach. Paczolay et al. (2014) extracted a vein density description for. ni. the leafscan category using shape parameters and the cumulative histogram representa-. U. tion of Multiscale Triangular shape descriptors (Mouine et al., 2013). Then, a Random Forest classifier was employed for species classification. For the other categories, they employed a Color-Gradient Histogram (CGH) for feature representation and k-nearest neighbour (KNN) classifiers for species classification. Table. 2.2 shows the summary of the organ-specific feature extraction approaches. Although reliable performance was reported, to design or decide which feature descriptors to use for each plant view (organ) is highly dependent on one prior knowledge of plant organs, and, this kind of information is usually only partially available or incomplete for non-specialist users. Ever since, DL has 22.

(38) been proved extremely high recognition capabilities in dealing with very large datasets, replacing the necessity of human hand-design, lately, Ge et al. (2015) proposed using an end-to-end CNN to replace those handcrafted feature extractors. They introduced the organ-specific CNN models where each model is trained with dedicated plant organs. For example, as shown in Table. 2.2, they have four different subsets of organ categories (A,B,C,D) where each of them is trained in separate organ-specific CNNs. However, al-. specific organ categories might restrict its performance.. ya. though CNN is powerful in learning discriminative features, constraining it to learn on. al a. Besides using organ-specific features for different plant views categories, there are. M. also researchers employed generic features for all the categories irrespective of organ information. They do not pre-categorise the plant images into their respective organ cate-. of. gories and extract its features using a dedicated feature representation, but represent them using a same feature representation. For example, Szűcs et al. (2014) employed Gaussian. ty. Mixture Model (GMM) based Fisher vector (FV) representation from a dense SIFT fea-. rs i. tures extraction for all the categories. Next, Chen et al. (2014) first extracted dense SIFT. ve. and Color Moments (CMs) in every plant image, and then modeled each feature with GMMs for producing the Fisher Vector representation. Finally, a linear SVM was learned. ni. for each feature. Fakhfakh et al. (2014) used Harris detector, Haar wavelet and Color his-. U. togram to extract features for all the plant images. They then used hamming distance to search for the relevant images from the training dataset. Karamti et al. (2014) used three descriptors for features extraction: Color Layout Descriptor (CLD), EOH and a Scalable Color Descriptor (SCD). Then, they attempted to use the contextual content in the associated XML documents for each image with textual and structural representations. Lately, inspired by the CNN breakthrough in image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, 2015), Champ et al. (2015); Choi (2015); E. M. M. Ghazi et al. (2015); Reyes et al. (2015) employed CNN based approach, as one of the most used 23.

(39) 24. Paczolay et al. (2014) Ge et al. (2015). Yanikoglu et al. (2014). Publications Dimitrovski et al. (2014) Goëau et al. (2014). ve. ni. CGH organ-specifc CNN (D). ,. organ-specifc CNN (B). vein density shape organ-specifc CNN (B). RIT, OH, Color Autocorrelogram and etc. + Morphological Covariance descriptor CGH. ty. organ-specifc CNN. rs i. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV, HSV, shape and geometry descriptor CCH, RIT, OH, Color Autocorrelogram and etc.. Stem SIFT features. Leaf SIFT features. Leafscan TSLA. BOW. ya. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. Fruit SIFT features. organ-specifc CNN (C). CGH. al a. M. organ-specifc CNN (A). CGH. CNN. of. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. Branch SIFT features. Table 2.2: Summary of the organ-specific feature extraction approaches.. U. organ-specifc CNN (C). CGH. BOW. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. Flower SIFT features. organ-specifc CNN (A). CGH. BOW. SURF, Fourier2D, LBP, EOH, wRGB, w-LUV and HSV. Entire SIFT features.

(40) DL methods to learn generic representation for images of plants. Specifically, M-class species classifier is trained, irrespective of the organ or organ structure. In (Choi, 2015), training of a generic network using a deeper learning network, the GoogLenet showed the best result in the Plantclef 2015 dataset (Joly et al., 2015). From these findings, it is undeniable that CNN features learned in an unsupervised way, without being imposed by heuristic rules, are more powerful and distinctive for representing different plant views. ya. images of large variability in the appearance. In Table 2.3, one can observe that DL techniques are the most frequently used now, replacing the hand-engineering approaches.. al a. However, although generic features learned can model target species classes, they might. M. not be able to provide an appropriate description for a plant as a generic network learns irrelevant features, especially when they appear to be discriminative among species. For. of. this reason, this thesis proposes a new CNN architecture that can go beyond the regular generic description of a plant, integrating the organ-specific features together with. ty. the generic features to explicitly force the network to focus on the organ regions during. rs i. species classification.. ve. Table 2.3: Summary of the generic feature extraction approaches. U. ni. Publications Szűcs et al. (2014) Fakhfakh et al. (2014) Karamti et al. (2014) Reyes et al. (2015) Choi (2015) Champ et al. (2015) E. M. M. Ghazi et al. (2015). Feature Extraction SIFT and CMs Harris detector, Haar wavelet and Color histogram CLD, EOH and SCD CNN CNN CNN CNN. Apart from that, notice that the plant images either modeled by separate organ specific networks or a same generic network, are not learned in such a way that to model plant structures. In such case, the representation obtained can focus only on the local information in an image, but fails to capture the structural information composed of different plant views (organs). The relation between the plant structures is important as the images 25.

(41) captured from a same plant contain structural information that is not mutually exclusive. In fact, they share overlapping characteristics which are useful for species recognition. For example, as shown in Fig. 1.3 where Fig. 1.3(a) and Fig. 1.3(b) exhibit plant views images of different plant entities tagged by ObservationId 14982 and 6840 respectively in PlantClef2015 dataset (Joly et al., 2015). It is noticeable that plant views images of a same plant exhibit correlated or overlapping characteristics in their organ structures, but. ya. comparing between Fig. 1.3(a) and Fig. 1.3(b), both characteristics are obviously very different. Henceforth, this thesis ventures into new alternative and moves beyond exist-. al a. ing practice, proposing a new plant classification framework that takes into account of. M. contextual dependencies between varying plant views capturing one or more organs of a. U. ni. ve. rs i. ty. of. plant.. 26.

(42) CHAPTER 3: AUTOMATED PLANT IDENTIFICATION. This chapter introduces an automated plant identification system based on DL. It begins with an introduction to DL. Next, it introduces the idea of DL for automatic processing and classification in order to learn and discover useful features for leaf data. The universal occurrence of variability in natural object kinds, including species, will be described,. ya. showing first how it can confound the classification task, but also how it can be exploited. Deep Learning. M. 3.1. al a. to provide better solutions by using DL.. DL is part of machine learning methods, consisting of multiple processing layers that. of. allow representation learning of multiple levels data abstraction. It belongs to the family of artificial neural networks to estimate or learn functions based on a large number of input. ty. data that are originally unknown. DL can be seen as systems of interconnected "neurons". rs i. that are able to exchange messages with each other. These messages are passed through. ve. connections controlled by numerical weights learned through experience. In essence, the gist of DL is its capacity to create and extrapolate new features from raw representations. ni. of input data without having to be told explicitly which features to use and how to extract. U. them.. Nowadays, DL has been gradually reducing the importance of feature engineering. approaches (LeCun et al., 2015). The main reason for this is that engineering features is very tedious and time-consuming since only the features relevant for optimal data modeling are needed to be extracted. In fact, the usefulness of these features is always unknown until several training and testing phases have been carried out. Ever since DL was proven to be capable of handling very large sets of high dimensional data, replacing the necessity of human hand-design, DL methods are increasingly presented in various applications of 27.

(43) computer vision (Jiang et al., 2015). It has outstripped the SOTA in speech recognition, visual object recognition and object detection, and it also produced promising results in sequential data processing, particularly in natural language understanding such as image captioning, question answering and language translation. In the plant identification domain, numerous studies have focused on procedures or algorithms that maximize the use of leaf databases, and this always leads to a norm that. ya. leaf features are liable to change with different leaf data and feature extraction techniques. Heretofore, the ambiguity surrounding the features that best represent the leaf data is. al a. remained unsolved. Hence, in the present study, instead of delving into the creation. M. of feature representation as in previous approaches (Cope et al., 2010b; N. Kumar et al., 2012; Naresh & Nagendraswamy, 2016), this work reverse-engineers the process by. of. asking DL to interpret and elicit the particular features that best represent the leaf data. By means of the interpretation results, one can perceive the cognitive complexities of vision. ty. for leaves as such, reflecting the trivial knowledge researchers intuitively deploy in their. Features Exploration. ve. 3.2. rs i. imaginative vision from the outset.. ni. In this section, the methodology employed to interpret the best features of leaf is ex-. U. plained. It first investigates one of the DL architectures, namely CNN, to learn a robust representation of leaf images, and subsequently shows the use of DL by venturing into each CNN layer and interprets its neuron computation to quantify the prerequisite features for leaf representation. Fig. 3.1 depicts the overall framework of the proposed approach.. 3.2.1. Convolutional Neural Networks. Firstly, the AlexNet CNN architecture (Krizhevsky et al., 2012) is re-used to learn the representation of leaf images. The reasons are: 1) it is widely known that features ex28.

(44) ya al a. rs i. ty. of. M. Figure 3.1: The proposed DL framework shown in a bottom-up and top-down way to study and understand plant identification. Best viewed in electronic form.. ve. Figure 3.2: The AlexNet CNN architecture used for plant identification. Best viewed in electronic form.. ni. tracted from the activation of a CNN trained in large-scale object recognition studies can. U. be re-purposed for a novel generic task (Donahue et al., 2014), 2) the leaf training set employed in this work is not as large as the ILSVRC2012 dataset - as indicated in (Dong et al., 2014), the performance of the CNN model is highly dependent on the size and level of diversity of the training set. Hence, in this case, the network is initially pre-trained with the big and diverse ILSVRC2012 dataset before fine-tuned with the leaf training set, and 3) among many object classification networks, the most light-weight and simple network structure is selected to test the concept. For the CNN model, fine-tuning is performed using a 44 classes leaf dataset col29.