• Tiada Hasil Ditemukan

A thesis submitted in fulfillment of the requirement for the degree of Master of Science (Mechatronics Engineering)

N/A
N/A
Protected

Academic year: 2022

Share "A thesis submitted in fulfillment of the requirement for the degree of Master of Science (Mechatronics Engineering) "

Copied!
24
0
0

Tekspenuh

(1)

PERFORMANCE ANALYSIS OF NOVEL SPEECH BASED PSYCHOLOGICAL ASSESSMENT TOOL

USING BAHASA MALAYSIA

BY

HUDA BINTI AZAM

A thesis submitted in fulfillment of the requirement for the degree of Master of Science (Mechatronics Engineering)

Kulliyyah of Engineering

International Islamic University Malaysia

OCTOBER 2017

(2)

iii

ABSTRACT

Major depressive disorder is a global growing cause for concern. In pursuit of reducing the statistics, since the past few decades, researchers have been working on producing automatic objective screening mechanism using biometric parameters. One of the possible parameters in diagnosing psychological state is speech, which is dependent on many factors such as language and speakers. However, to date, none of the research was done using speech characteristics in Bahasa Malaysia native speakers. This research hereby sought to identify possible acoustic features that can be used as an indicator for depression using speech in Bahasa Malaysia. Since the characteristics of male and female speech are different, the data was analysed separately. We obtained clinically validated data of six depressed and ten healthy subjects for male, and seven depressed and ten healthy subjects for female. Four types of acoustic features were extracted, namely Mel Frequency Cepstral Coefficient (MFCC), Power Spectral Density (PSD), Transition Parameters and Interval length Probability Density Function (PDF). Linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were used to obtain the decision boundary for the pairwise classification with resampling techniques of jack-knife and cross validation. Using only a single feature to train the classifier model, we found that the first cepstral coefficient, MFCC-C1 outperformed the rest with 93.2% in accuracy when classified using LDA for male data and 95% in accuracy when classified using both LDA and QDA for female data. Remarkably, it can be concluded that information contained in MFCC-C1 is robust across gender. To get the optimum feature combination, the features were then combined with the maximum number of three. For both datasets, 100% accuracies were achieved by classifiers trained using combinations of two features. Considering the difficulty in acquiring depression speech database, more effort shall be put into collecting more data and validating the analysis. This work demonstrates an optimistic implementation of the desired objective diagnostic tool as it proves that there are distinctive patterns in depressed and healthy datasets of Bahasa Malaysia speakers.

(3)

iv ثحبلا ةصلاخ

نورقلا للاخ نوثحاب لمع ،تائاصحلاا ضيفتخ لىا ًايعسو .ةقلقمو ةدعاصتم ةيلداع ةيضق يى بيائتكلاا بارطضلاا لماوعلا هذى مىأ دحأ .ةيويح سيياقم لماوع مادختساب ةيلقعلا ةحصلا ةبقارلد ةيكيتاموتوا ةيلآ جاتنا ىلع ةيرخلأا كلا وى ةيسفنلا ةلالحا صيخشتل ولمتلمحا تىح ،كلذ عمو .ثدحتلداو وغللا لثم هددعتم لماوع ىلع دمتعي ثيح ،ملا

لىإ ثحبلا اذى ىعس لياتلابو .ةيويلالدا وغللاب ينيلصلأا ينثدحتملل ملاكلا صئاصخ مادختساب ثبح يأ متي لم ،مويلا لاكلا مادختساب بائتكلاا ىلع رشؤمك اهمادختسا نكيم تيلا ةلمتلمحا ةيتوصلا صئاصلخا فيرعت ببسب .يزيلالدا م

تانايب ىلع لوصلحا تم .ولصفنم ةروصب تانايبلا ليلتح تم ،ءاسنلاو لاجرلا ينب ةيتوصلا تاراشلإل صئاصلخا فلاتخا ( تس ددع نم ًايحص اهيلع قداصم 6

( رشعو ىضرم ) 01

( عبس ددع لىإ ةفاضلإاب روكذ ءاحصأ ) 7

ىضرم )

( رشعو 01 لخا نم عاونأ ةعبرأ .ثانلإا نم ءاحصأ ) يىو اهجارختسا تم ةيتوصلا صئاص

Mel Frequency

Cepstral Coefficient (MFCC), Power Spectral Density, Transition Parameters and Interval length Probability Density Function في رارقلا ذاتخلا .

مادختسا تم يجوزلا فنصلدا Linear discriminant analysis (LDA) and quadratic

discriminant analysis (QDA) ةنياعلدا ةداعإ ةينقت عم

jack-knife

مادختساب .عطاقتلدا قيدصتلاو

لماعم نأ لىإ لصوتلا تم ،فنصلدا جذونم بيردت لجأ نم دحاو لماعم MFCC

تلاماعلدا ةيقب ىلع قوفت لولأا

ةبسنب 2..9 مادختساب فينصتلا دنع ةقدلا نم % LDA

ذوخالدا تانايبلل ةبسنب و ،روكذلا نم ه

29 دنع %

مادختساب فينصتلا LDA

و QDA تامولعلدا نأ جاتنتسلاا نكيم ،زراب ونح ىلع .ثانلإا نم هذوخألدا تانايبلل

ل لولأا لماعلدا في ونمضتلدا MFCC

عيمتج تم ،لثملأا صئاصلخا ةبيكرت ىلع لوصحلل .ينسنلجا لاك في ةيوق

ددع ىصقأ دبح صئاصلخا .

اق نم لكل . ةقد لدعم ويلا لصوتلا تم ،تانايبلا ينتدع

011 دمتعلدا فنصلدا نم %

نم ديزلداف ،ةضيرلدا تانيعلا نم ةيملاكلا تانايبلا عيمتج في ةنماكلا تابوعصلا رابتعلاا ينعب ذخلأابو .ينتيصاخ ىلع طت في يربك دهج لثيم لمعلا اذى .ليلحتلاو ققحتلل رثكأ تانايب عيمتج لجا نم بولطم دهلجا ةبولطلدا تاودلأا ريو

وأ ةيويلالدا ةغللاب ينقطانلا تانيع في ةضيرم وأ ةحيحص ةزيمم طانمأ دجوت ونأ تتبثأ انهوك ثيح سوسلمحا صيخشتلل

.ةيزيلالدا

(4)

v

APPROVAL PAGE

I certify that I have supervised and read this study and that in my opinion, it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis for the degree of Master of Science (Mechatronics Engineering).

………..

Nik Nur Wahidah Nik Hashim Supervisor

………..

Wahju Sediono Co-Supervisor

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis for the degree of Master of Science (Mechatronics Engineering).

………..

Noor Hazrin Hany Mohamad Hanif

Internal Examiner

………..

Lee Poh Foong External Examiner

This thesis was submitted to the Department of Mechatronics Engineering and is accepted as a fulfillment of the requirement for the degree of Master of Science (Mechatronics Engineering).

………..

Syamsul Bahrin Abdul Hamid Head, Department of

Mechatronics Engineering

This thesis was submitted to the Kulliyyah of Engineering and is accepted as a fulfillment of the requirement for the degree of Master of Science (Mechatronics Engineering).

………..

Erry Yulian Triblas Adesta Dean, Kulliyyah of Engineering

(5)

vi

DECLARATION

I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.

Huda binti Azam

Signature ... Date ...

(6)

vii

COPYRIGHT PAGE

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA DECLARATION OF COPYRIGHT AND AFFIRMATION OF

FAIR USE OF UNPUBLISHED RESEARCH

PERFORMANCE ANALYSIS OF NOVEL SPEECH BASED PSYCHOLOGICAL ASSESSMENT TOOL USING BAHASA

MALAYSIA

I declare that the copyright holders of this dissertation are jointly owned by the student and IIUM.

Copyright © 2017 (Huda binti Azam) and International Islamic University Malaysia. All rights reserved.

No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below

1. Any material contained in or derived from this unpublished research may be used by others in their writing with due acknowledgement.

2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieved system and supply copies of this unpublished research if requested by other universities and research libraries.

By signing this form, I acknowledged that I have read and understand the IIUM Intellectual Property Right and Commercialization policy.

Affirmed by Huda binti Azam

……..……….. ………..

Signature Date

(7)

viii

ACKNOWLEDGEMENTS

All praise is due to Allah, the all Knower, and the all Wise.

Firstly, I would like to express my sincere gratitude to Dr. Nik Nur Wahidah Nik Hashim for her continuous support, encouragement, patience and love. She has always been there for me and I could never have imagined having a better supervisor and mentor than her. My sincere thanks also goes to my co-supervisor, Dr. –Ing Wahju Sediono, whose support and cooperation ever since my undergraduate years contributed to the outcome of this work.

I would also like to acknowledge the fellow psychiatrists and clinical psychologists:

Dr Firdaus Mukhtar, Dr Normala Ibrahim, Dr Salina Abdul Aziz and Dr Syarifah Suziah Syed Mokhtar, who helped me a lot in the data collection process. Without their precious support, this research could not have been completed. Furthermore, to the patients and participants who participated and spared their valuable time to make this research possible; thank you and I pray for your speedy recovery.

I take this opportunity to express a deep sense of gratitude to my fellow brothers and sisters of Biomechatronics laboratory, for the simulating discussions, for all the fun and joy we had and for always being my shoulders to lean on. This journey would have been so dull without all of you. My grateful thanks are also extended to all lecturers, staffs and postgraduate students of Kulliyyah of Engineering, IIUM for their constructive criticism and guidance.

It is my utmost pleasure to dedicate this work to my family: my dear parents, Azam Abas and Zarina Jais, my brothers and sisters, who granted me the gift of their unwavering belief in my ability to accomplish this goal: thank you for your support and patience. I can only ask Allah to reward them immensely.

(8)

ix

TABLE OF CONTENTS

Abstract ... iii

Abstract in Arabic ... iv

Approval page ... v

Declaration ... vi

Copyright Page ... vii

Acknowledgements ... viii

Table of Contents ... ix

List of Tables ... xi

List of Figure ... xiii

List of Abbreviations ... xvi

CHAPTER ONE: INTRODUCTION ... 1 1 1.1 Research Motivation ... 1

1.2 Problem Statement ... 3

1.3 Research Objectives ... 4

1.4 Research Methodology... 4

1.5 Scope of Research ... 6

1.6 Organization of Thesis ... 7

CHAPTER TWO: BACKGROUND AND LITERATURE REVIEW ... 8

2.1 Introduction ... 8

2.2 Mechanism Of Speech Production ... 8

2.3 Depressive Speech ... 10

2.3.1 Depressive speech corpora ... 11

2.4 Previous Analysis of Features in Depressed Speech ... 12

2.4.1 Prosodic ... 12

2.4.2 Source ... 13

2.4.3 Spectral... 14

2.4.4 Formants ... 15

2.4.5 Mel Frequency Cepstral Coefficient ... 16

2.4.6 Related Works Using Combination of Features ... 18

2.5 Summary ... 19

CHAPTER THREE: METHODOLOGY ... 20

3.1 Introduction ... 20

3.2 Database Formulation ... 21

3.2.1 Data Collection ... 21

3.2.2 Pre-processing ... 22

3.3 Voiced, Unvoiced and Silence Detection ... 22

3.4 Feature Extraction ... 23

3.4.1 Power Spectral Density ... 23

3.4.2 Mel Frequency Cepstral Coefficient ... 25

3.4.3 Transition Matrix ... 27

3.4.4 Interval Length Probability Density Function ... 29

3.5 Classification ... 31

(9)

x

3.5.1 Quadratic and Linear Classifier ... 31

3.5.2 Resampling Methods ... 32

3.6 Performance Measures ... 33

3.6.1 Accuracy ... 33

3.6.2 Sensitivity ... 33

3.6.3 Specificity ... 34

3.7 Summary ... 34

CHAPTER FOUR: RESULT ... 35

4.1 Introduction ... 35

4.2 Classification Performance Using Male Data ... 35

4.2.1 Classification using Mel Frequency Cepstral Coefficient ... 35

4.2.2 Classification using Power Spectral Density ... 40

4.2.3 Classification using Markov Transition Matrix ... 42

4.2.4 Classification using Interval Length PDF ... 44

4.3 Classification Performance Using Female Data ... 48

4.3.1 Classification using Mel Frequency Cepstral Coefficient ... 48

4.3.2 Classification using Power Spectral Density ... 52

4.3.3 Classification using Markov Transition Matrix ... 54

4.3.4 Classification using Interval Length PDF ... 58

4.4 Comparative Analysis of Inter-Features ... 61

4.4.1 Comparison of Performance using a Single Feature in Male Data ... 61

4.4.2 Comparison of Intra- and Inter-features combination in Male Data ... 63

4.4.3 Comparison of Performance using a Single Feature in Female Data ... 64

4.4.4 Comparison of Intra- and Inter-features Combination in Female Data ... 65

4.5 Comparison with Existing Models ... 66

4.6 Summary ... 67

CHAPTER FIVE: CONCLUSIONS AND RECOMMENDATIONS ... 68

5.1 Introduction ... 68

5.2 Summary ... 68

5.3 Contributions ... 69

5.4 Limitations ... 69

5.5 Future Work... 70

REFERENCES ... 72

PUBLICATIONS ... 78

APPENDIX A: LITERATURE REVIEW COMPARATIVE TABLE... 79

APPENDIX B: BECK HOPELESSNES SCALE ... 85

APPENDIX C: BECK DEPRESION INVENTORY-MALAY ... 87

(10)

xi

LIST OF TABLES

Table 2.1 Summary of psychological impairment speech corpora used by previous researchers

11

Table 3.1 Database Information 22

Table 3.2 3dB band limits of the band pass filter corresponding to the output of each band level

23

Table 3.3 PSD band's frequency range 24

Table 4.1 Summary of statistical data of MFCC in male speech 36 Table ‎2.4 The selected classification result of MFCC using LDA for male

data

38

Table ‎4.3 The selected classification result of MFCC using QDA for male data

39

Table ‎4.4 Summary of statistical data of spectral energy ratios in male speech 40 Table ‎4.5 The selected classification result of PSD using LDA for male data 42 Table ‎4.6 The selected classification result of PSD using QDA for male data 42 Table ‎4.7 Summary of statistical data of the nine Transition parameters of

male data

43

Table ‎4.8 The selected classification result of Transition matrix using LDA for male data

44

Table ‎2.4 Summary of statistical data of silence and voiced intervals in male speech

45

Table ‎4.10 The selected classification result of Interval length PDF using LDA for male data

47

Table ‎4.11 The selected classification result of Interval length PDF using QDA for male data

47

Table ‎4.12 Summary of statistical data of MFCC in female speech 49 Table ‎4.13 The selected classification result of MFCC using LDA for female

data

51

(11)

xii

Table ‎4.14 The selected classification result of MFCC using QDA for female data

51

Table ‎4.15 Summary of statistical data of spectral energy ratios in female speech

52

Table ‎4.16 The selected classification result of PSD for female data 54 Table ‎4.17 Summary of statistical data of the nine Transition parameters of

female data

54

Table ‎4.18 The selected classification result of Transition matrix using LDA for female data

56

Table ‎4.19 The selected classification result of Transition matrix using QDA for female data

57

Table ‎4.20 Summary of statistical data of silence and voiced intervals in female speech

58

Table ‎4.21 The selected classification result of Interval length PDF using LDA for female data

60

Table ‎4.22 The selected classification result of Interval length PDF using QDA for female data

60

Table ‎4.23 The best classifier model within feature group in male 62 Table ‎4.24 The best classifier models with combination of intra- and inter-

features

63

Table ‎4.25 Combination of features that produce 100% accuracy in male data 64 Table ‎4.26 The best classifier model within feature group in female 64 Table ‎4.27 The best classifier models with combination of intra- and inter-

features in female data

66

Table ‎4.28 Combination of features that produce 100% accuracy in female data 66 Table ‎4.29 Performance comparison of existing models against proposed

model

67

(12)

xiii

LIST OF FIGURE

Figure ‎1.1 Percentage of population whom diagnosed with clinical depression (Ferrari et al., 2013)

1

Figure ‎1.2 Flow chart of the methodology 6

Figure ‎2.1 Schematic view of human speech production (Flanagan, 1972) 9

Figure ‎2.2 Speech production model (Flanagan, 1972) 9

Figure ‎3.1 Methodology of mood class determination 20

Figure ‎3.2 PSD feature extraction 24

Figure ‎3.3 Block diagram of MFCC extraction 25

Figure ‎3.4 An example of total filters‟ frequency response of data D150001FM.wav

26

Figure ‎3.5 An example of MFCC discrete cosine transform matrix of data D150001FM.wav

27

Figure ‎3.6 Configuration of transition matrix feature extraction 27 Figure ‎3.7 Markov model's graphical representation, where voiced, unvoiced

and silence are labelled as V (1), UV (2) and S (3)

28

Figure ‎3.8 Graphical representation of interval length and state transition of a sampled speech signal

29

Figure ‎3.9 Voiced and silence interval PDF distribution 30 Figure ‎4.1 Comparison of classification results between LDA and QDA with

jack-knife resampling method using MFCC for male data in terms of Sensitivity

37

Figure ‎4.2 Comparison of classification results between LDA and QDA with jack-knife resampling method using MFCC for male data in terms of Specificity

37

Figure ‎4.3 Plot of male depressed and healthy speakers distribution for the combined feature set of MFCC C1 with C11 using linear and quadratic discriminant analysis

39

Figure ‎4.4 Comparison of classification results between LDA and QDA with 41

(13)

xiv

jack-knife resampling method using Power Spectral Density for male data in terms of Sensitivity

Figure ‎4.5 Figure ‎4.5 Comparison of classification results between LDA and QDA with jack-knife resampling method using Power Spectral Density for male data in terms of Specificity

41

Figure ‎4.6 Comparison of classification results in terms of Sensitivity and Specificity of Markov Transition Matrix using LDA with jack- knife resampling method in male data

43

Figure ‎4.7 Comparison of classification results between LDA and QDA with jack-knife resampling method using Interval Length PDF for male data in terms of Sensitivity

46

Figure ‎4.8 Comparison of classification results between LDA and QDA with jack-knife resampling method using Interval Length PDF for male data in terms of Specificity

46

Figure ‎4.9 Plot of depressed and healthy speakers distribution for the combined feature set of Interval length PDF of band Silence-2 with Voiced-2 using linear and quadratic discriminant analysis

48

Figure ‎4.10 Comparison of classification results between LDA and QDA with jack-knife resampling method using MFCC for female data in terms of Sensitivity

50

Figure ‎4.11 Comparison of classification results between LDA and QDA with jack-knife resampling method using MFCC for female data in terms of Specificity

50

Figure ‎4.12 Plot of female depressed and healthy speakers distribution for the combined feature set of MFCC C1 with C11 using linear and quadratic discriminant analysis

52

Figure ‎4.13 Comparison of classification results between LDA and QDA with cross validation resampling method using PSD for female data in terms of Sensitivity

53

Figure ‎4.14 Comparison of classification results between LDA and QDA with cross validation resampling method using PSD for female data in terms of Specificity

53

Figure ‎4.15 Comparison of classification results of LDA and QDA with jack- knife resampling method using Markov Transition Matrix for female data in terms of Sensitivity

55

(14)

xv

Figure ‎4.16 Comparison of classification results of LDA and QDA with jack- knife resampling method using Markov Transition Matrix for female data in terms of Specificity

56

Figure ‎4.17 Plot of depressed and healthy speakers distribution for the combined feature set of Unvoiced-to-Voiced (t21) with Unvoiced-to-Unvoiced (t22) using linear and quadratic discriminant analysis

58

Figure ‎4.18 Comparison of classification results of LDA and QDA with cross-validation resampling method using Interval Length PDF for female data in terms of Sensitivity

59

Figure ‎4.19 Comparison of classification results of LDA and QDA with cross-validation resampling method using Interval Length PDF for female data in terms of Specificity

59

Figure ‎4.20 Plot of depressed and healthy speakers distribution for the combined feature set of Silence-4 with Voiced-4 using LDA and QDA

61

Figure ‎4.21 Feature group-based sensitivity and specificity result trained using a single feature in male data

63

Figure ‎4.22 Feature group-based sensitivity and specificity result trained using a single feature in female data

65

(15)

xvi

LIST OF ABBREVIATIONS

FFT Fast Fourier Transform

BDI Beck Depression Inventory

BHS Beck Hopelessness Scale

FN False Negative

FP False Positive

HKJ Hospital Kajang

HKL Hospital Kuala Lumpur

LDA Linear Discriminant Analysis

MDD Major Depressive Disorder

MFCC Mel Frequency Cepstral Coefficient

PDF Probability Density Function

PSD Power Spectral Density

QDA Quadratic Discriminant Analysis

SVM Support Vector Machine

TN True Negative

TP True Positive

TR Transition Matrix

WHO World Health Organization

(16)

1

CHAPTER ONE INTRODUCTION

1.1 RESEARCH MOTIVATION

Depression is the primary cause of morbidity worldwide. Among the tell-tale signs of depression are intense sadness, lowering of mood, lethargy, observable psychomotor retardation and inability to carry out daily activities, for a span of a fortnight or longer.

It affects how people feel, think and behave and can result in a variety of emotional and physical problems. According to statistics from the World Health Organization (WHO), over 350 million people of all ages struggle with depression (Chisholm, Yasamy, Marcus, Ommeren, & Saxena, 2012). Figure 1.1 shows percentage of population diagnosed as being depressed, with the Middle East and North Africa having the highest percentage. Malaysia is also facing an increasing trend in this issue.

According to a poll by National Health and Morbidity (Malaysia) in 2011, 12% of Malaysians aged between 18 and 60 suffered from mental health issues (Institute for

Figure 1.1 Percentage of population whom diagnosed with clinical depression (Ferrari et al., 2013)

(17)

2

Public Health - Ministry of Public Health (Malaysia), 2011). The same survey in 2015 revealed a significant surge to 29.2 per cent of adults aged 16 and above (4.2 million) were struggling with mental health problems (Institute for Public Health - Ministry of Public Health (Malaysia), 2015). Depression usually present as a primary disorder with co-morbid psychological problems, physical illnesses and at its worst, suicidal behaviours (Hawton, Casañas I Comabella, Haw, & Saunders, 2013). Over the past 45 years, Malaysian suicide rates were reported to have increased by 60% (Malaysian Psychiatric Association, 2006).

With the steadily increasing rate year-by-year, depression is expected to become the leading mental health illness Malaysians will suffer from by 2020. One of the key efforts in National Suicide Prevention Strategic Action Plan was to transfer mental health treatment from hospitals to community mental health centres to make it more available to the general public. The Health Ministry also aims an improvement in the ratio of psychiatrists to the population with a ratio of 1:50,000 (Tan, 2014).

Notwithstanding that, a study by Wolters Kluwer Health reported that patients suffering from depression are more likely to refer to the primary care physicians and outpatient general medical settings than psychiatrists for diagnosis and treatment (2015). 74% of the Malaysian suicide attempters were reported did not know how to access counselling services even when 53% of them have heard about such services from the media (Sinniah, Maniam, Oei, & Subramaniam, 2014).

Diagnosis using measurable biomarkers has not been fully embraced in mainstream psychiatry, despite the rise of popularity of research in this area.

Diagnosis using this method requires measurement and evaluation of biomarkers as an indicator of psychological conditions (Jain, Hong, & Pankanti, 2001). The idea is not to replace the clinical psychologists, rather, to add an objective weight to the

(18)

3

diagnosis. A number of research has been done to correlate mental impairment with biomarkers such as stress levels (Sano & Picard, 2013), head movements (Leask, Park, Khana, & Dimambro, 2013), psychomotor symptoms (Fava & Kendler, 2000) and facial expression (Valstar et al., 2016).

Previous studies have suggested that depression is associated with distinctive speech pattern, such as decreased in verbal activity productivity, diminished prosody and monotonous. These characteristics correlate with the disturbances occurring in the respiratory, laryngeal, resonance and articulatory system which are embedded in the acoustic signals. Using speech as the parameter in the automated depression detection system enables the system to be cheap, remote, non-invasive and non-intrusive.

Speech signal is a preferable biometric since recording a stream of audio data and extracting features from it is comparatively easier than other methods. Accurate detection from speech could lead to an objective diagnostic aid to assist clinicians to better diagnose the psychological state. This method of distinction and prediction would be highly beneficial in real-world applications, as it is unobtrusive to patients and practicable for the use of researchers and clinicians. This can also shorten the path from the diagnosis to the right treatment.

1.2 PROBLEM STATEMENT

With the steadily increasing rate of mental health problem globally, it is desirable to have an early detection system that could effectively identify psychological state using biometric characteristics. In recent years, research into the objective classification of mood disorders in identifying depressed and control state has been done by many countries in various languages, but none has been done on speech characteristics in Bahasa Malaysia speakers. It is necessary to test specifically since speech signal is

(19)

4

dependent on many factors including languages and speakers (Bhaykar, Yadav, &

Rao, 2013). As such, thorough research in obtaining acoustic measurements that can identify psychological state through speech in Bahasa Malaysia speakers is proposed.

1.3 RESEARCH OBJECTIVES

Given the problems raised in the previous section, this study is aimed to achieve the following objectives:

1. To investigate possible voice parameters that can be used as an indicator for depression and control state using Bahasa Malaysia speakers

2. To evaluate the performance of speech features in identifying depression and control state in Bahasa Malaysia native speakers

1.4 RESEARCH METHODOLOGY

The research methodologies adopted in achieving the stated objectives are:

Stage 1: Comprehensive literature review

In the early stage, an extensive literature review regarding the conventional method in detecting emotions, speech production mechanism, experimental setup, as well as features and classification techniques employed by previous researchers were done.

The formulation of the problem statement and objectives of this study were developed here.

Stage 2: Design of experimental setup and data formulation

The case groups chosen for depression class were cases seen at Hospital Kajang and Hospital Kuala Lumpur, with the help of four clinical psychologists. The experimental setup including questionnaires and materials for interviews were fixed at this stage.

(20)

5

Stage 3: Pre-processing, feature extraction and classification

The recordings were pre-processed for de-identification and removal of unnecessary sound such as door slam and sneezing. Engineered features were then extracted from the pre-processed raw speech signals which were then form feature vectors. The feature vectors were resampled and fed into classifier models for the classification analysis. For the classification, algorithms of linear and quadratic discriminant analysis were utilized.

Stage 4: Evaluation

The evaluation of performance of the classifier model was evaluated using performance metrics of accuracy, sensitivity and specificity. Statistical test was also done on the feature vectors to see the distribution of data between the depressed and healthy groups statistically. The flow chart of the methodology of this research can be seen at Figure 1.2.

(21)

6

Figure 1.2 Flow chart of the methodology

1.5 SCOPE OF RESEARCH

The scope of the research is restricted to the use of speech signal as the biometric parameters in identifying the state of psychological classes, i.e. depressed or healthy.

The database is made up of spontaneous adult Bahasa Malaysia native speakers‟

speech with the age range of seventeen to sixty five years old.

(22)

7 1.6 ORGANIZATION OF THESIS

This remainder of this thesis is organized as follows:

Chapter Two provides background and some basic concept of mechanism of speech

production briefly. The chapter also summarizes the previous work done by other researchers in the related field of study.

Chapter Three presents the proposed system. It started off with explaining the

database formulation method. The methods used for feature extraction and feature classification are also explained in details.

Chapter Four discusses the results obtained from the statistical analysis and the

performance of the classifier models.

Chapter Five concludes the thesis with discussion and possible future work based on this study.

(23)

8

CHAPTER TWO

BACKGROUND AND LITERATURE REVIEW

2.1 INTRODUCTION

The aims of this chapter are to provide the readers with some theoretical concepts and summarize the previous work done in the related field of study. It is divided into three subsections. On the first subsection, some basic concepts of mechanism and physiological aspects of speech production are explained. It continues with an insight of relationship between psychological state with speech and characteristics of depressive speech. Previous analysis of features in depressed speech done by other researchers is presented in the last subsection.

2.2 MECHANISM OF SPEECH PRODUCTION

In producing speech, three components mandatorily needed are respiratory system, phonation system and articulatory system (Osdaz, 2001). In brief, lungs provide the energy source (air) in respiration, vocal folds convert the energy into audible sound in phonation and articulators transform the sound into intelligible speech in articulation.

Figure 2.1 illustrates the cross-sectional view of the vocal tract showing speech organs that are involved in articulation.

(24)

9

Figure 2.1 Schematic view of human speech production (Flanagan, 1972)

Referring to Figure 2.2 which illustrates a simplified block diagram of the physiological mechanism in producing speech, the respiratory system involves muscle force pushes air from lungs through trachea, bronchi and other associated muscles.

The air force is referred to as excitation signal, which in later systems will result the vocal cords to vibrate and propagate the energy to excite the oral and nasal openings.

This system is responsible for the amplitude of the sound, duration, stress pattern and pauses (L. Rabiner & Juang, 1993).

Figure 2.2 Speech production model (Flanagan, 1972)

Kulliyyah o

Rujukan

DOKUMEN BERKAITAN

In this chapter, an extensive literature survey is conducted by examining existing design of spherical robots, the modelling as well the control architecture employed on

(Tan, Yeo, & Tan, 2008) examined the effect of nano powder assisted machining by micro-EDM and found an improvement in the machined surface. Jahan, Rahman, & San Wong,

There are various methods that are used for the detection of liquids, for example using the Fourier transform infrared technique, differential scanning

This study proposes a composite controller that comprises Function Approximation Technique (FAT)-based Adaptive Controller (FATAC) for the slow subsystem to control

Figure 5.16 Velocity Control Block Diagram for Lifting Subsystem in Simulink 137 Figure 5.17 Lifting Subsystem’s Velocity Controller Step Response (Simulation) 138 Figure 5.18

The laboratory-scaled prototype was developed to test the design controlled with ACO technique where Global Positioning System (GPS) is used for the coordination of the

This thesis compiles the research work on the machine learning-based fingertip recognition algorithm that can potentially be used in assisting hand rehabilitation. Chapter

One shortcoming of using PPO in the context of autonomous driving using inputs from multiple sensors is robustness to sensor defectiveness or sensor