• Tiada Hasil Ditemukan

MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN:

N/A
N/A
Protected

Academic year: 2022

Share "MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN: "

Copied!
24
0
0
Tunjuk Lagi ( halaman)

Tekspenuh

(1)

MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN:

PHONEME RECOGNITION USING ARTIFICIAL NEURAL NETWORK

BY

ZULKHAIRI MOHD. YUSOF

A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in

Management Information System

Kulliyyah of

Information Communication and Technology International Islamic University

Malaysia

FEBRUARY 2011

(2)

ii ABSTRACT

It is estimated that about 2000 deaf children are born each year in Malaysia. Most deaf Malaysian children have very poor speech intelligibility. Reduced intelligibility severely compromises communication and social interaction for affected individuals.

Although speech deficiencies in the deaf are quite difficult to overcome, learning to produce intelligible speech is not an impossible task. Studies have shown that deaf children receiving Cued Speech can acquire reasonable speech intelligibility, surpassing the majority of signing children in verbal language skills. A reliable measure of speech intelligibility for deaf children is required for several reasons: to provide an index of the severity of speech disorder, to assist in treatment decisions, and to quantify changes which may result from intervention or treatment. This thesis investigates the approach to measure speech intelligibility of deaf Malaysian children.

The research discussed in this work starts with the experiments on a practical Malay Speech Intelligibility Test (MSIT), suitable for use within deaf Malaysian children training programme. In this study, speech intelligibility of deaf children is measured through the ability to say simple nonsense syllables (consisting of a consonant and a vowel) for all 22 Malay consonants. The MSIT score should indicate how well these children can produce speech; the higher the score, the better their speech intelligibility.

The next course of action was to investigate phoneme recognition system that will suit MSIT. Artificial neural network was utilized to effectively model the distribution of feature vectors present in speech signals for classification. A novel approach using speech spectrum image becomes the inputs to a three-layer MLP (Multi-layer Perceptron) neural network. The input feature sets for the intelligent phoneme identification were based on the intrinsic characteristics of Malay syllables shown in the captured speech signal spectrum image. The spectrum images were produced from widely used speech filter algorithm; Mel-frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP) and Relative Spectral Transform - Perceptual Linear Prediction (RASTA-PLP). The classifiers have been tested for twenty-two Malay phonemes utterances from two males and two females’ children speaker. The performance of the system for recognition of Malay phonemes is measured and compared with the performance of human listener. The successful development of the phoneme recognition system serves several purposes: (a) it will be one of the first methods employed to objectively measure speech intelligibility of deaf Malaysian children, and (b) it will contribute to better assessment and management of intervention programme for deaf Malaysian children.

(3)

iii

ﺺّﺨﻠﻣ ﺚﺤﺒﻟا

ﺮﻴﺸُﺗ تاﺮﻳﺪﻘﺘﻟا ﻰﻟإ

ّنأ ﻮﺤﻧ ﻦﻴﻔﻟأ ﻞﻔﻃ

ّﻢﺻأ ﺪﻟﻮﻳ ﺎّﻳﻮﻨﺳ ﻲﻓ

ﺎﻳﺰﻴﻟﺎﻣ .

ﺪﺟﻮﻳو

ىﺪﻟ ﻢﻈﻌﻣ لﺎﻔﻃﻷا

ّﻢﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا

ُمﺪﻋ حﻮﺿﻮﻟا ﻲﻓ

،مﻼﻜﻟا ﺊّﻴﺴﻟا

ﺔﻳﺎﻐﻠﻟ .

حﻮﺿﻮﻟاو ﺺﻗﺎﻨﻟا

ضّﻮﻘﻳ ةّﺪﺸﺑ

َلﺎﺼﺗﻻا

َﻞﻋﺎﻔﺘﻟاو ﻲﻋﺎﻤﺘﺟﻻا

داﺮﻓﻸﻟ

ﻦﻳرّﺮﻀﺘﻤﻟا .

ﻰﻠﻋو ﻢﻏّﺮﻟا

ﻦﻣ

ّنأ ا رﻮﺼﻘﻟ ﻲﻓ

مﻼﻜﻟا ﻦﻴﺑ

ّﻢُﺼﻟا ﺐﻌﺼﻳ

ﺐّﻠﻐﺘﻟا

،ﻪﻴﻠﻋ

ّنﺈﻓ ﻢّﻠﻌﺗ ﺔﻴﻔﻴآ جﺎﺘﻧإ مﻼﻜﻟا ﺢﺿاﻮﻟا

ﺲﻴﻟ ﺔّﻤﻬﻣ ﺔﻠﻴﺤﺘﺴﻣ

. ﺪﻗو

تﺮﻬﻇأ تﺎﺳارّﺪﻟا

ﻪّﻧأ ﻦﻜﻤُﻳ دﻻوﻸﻟ

ّﻢﺼﻟا ﻦﻳﺬﻟا

نﻮّﻘﻠﺘﻳ مﻼﻜﻟا

،ﻦﻴﻘﻠّﺘﻟﺎﺑ

ُﺐﺴآ رﺪﻗ

ﻻ سﺄﺑ ﻪﺑ ﻦﻣ حﻮﺿﻮﻟا ﻲﻓ

،مﻼﻜﻟا ﺎﻣ

زوﺎﺠﺘﻳ رﺪﻗ

ﺔﻴﺒﻟﺎﻏ دﻻوﻷا

ﻤّﻠﻜﺘﻤﻟا ﻦﻴ

ةرﺎﺷﻹﺎﺑ ﻲﻓ

تارﺎﻬﻤﻟا ﺔﻳﻮﻐﻠﻟا

ﺔﻴﻈﻔﻠﻟا .

ّنإو نزﻮﻟا قﻮﺛﻮﻤﻟا

ﻪﺑ

حﻮﺿﻮﻟ مﻼآ

دﻻوﻷا

ّﻢُﺼﻟا

ٌبﻮﻠﻄﻣ ةّﺪﻌﻟ

بﺎﺒﺳأ ﺎﻬﻨﻣ

: ﺮﻴﻓﻮﺗ ﺮّﺷﺆﻣ

ةّﺪﺷ

باﺮﻄﺿا

؛مﻼﻜﻟا ةﺪﻋﺎﺴﻣ

ﻰﻠﻋ ذﺎﺨﺗا تاراﺮﻗ

؛جﻼﻌﻟا ﺪﻳﺪﺤﺗو

تاﺮﻴﻴﻐﺘﻟا

ﻲﺘﻟا ﺪﻗ ﻢﺠﻨﺗ ﻦﻋ ﻞّﺧﺪﺘﻟا جﻼﻌﻟاو

. ﺪﻗو ﺖﺳرد ﻩﺬه

ﺔﻟﺎﺳﺮﻟا

َﻞﺧﺪﻤﻟا ﻰﻟإ

نزو

حﻮﺿو مﻼآ

دﻻوﻷا

ّﻢُﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا

. أﺪﺒﻓ شﺎﻘﻨﻟا ﻲﻓ

اﺬه ﺚﺤﺒﻟا ءاﺮﺟﺈﺑ

برﺎﺠﺗ ﻰﻠﻋ

ٍرﺎﺒﺘﺧا ﱟﻲﻠﻤﻋ

حﻮﺿﻮﻟ مﻼﻜﻟا

،يﻮﻳﻼﻤﻟا ﺢﻠﺼﻳ

ماﺪﺨﺘﺳﻼﻟ ﻲﻓ

ﺞﻣﺎﻧﺮﺑ ﺐﻳرﺪﺗ

دﻻوﻷا

ّﻢُﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا

. ﺪﻗو

ّﻢﺗ ﻲﻓ ﻩﺬه

،ﺔﺳارﺪﻟا

ُنزو

حﻮﺿو مﻼآ

دﻻوﻷا

،ّﻢﺼﻟا ﻦﻋ

ﻖﻳﺮﻃ ةرﺪﻘﻟا

ﻰﻠﻋ ﻖﻄﻨﻟا ﻊﻃﺎﻘﻤﺑ

ﺔﻴﺗﻮﺻ

ﺔﻴﺋاﺮه ﺔﻄﻴﺴﺑ

) نّﻮﻜﺘﺗ ﻦﻣ

فﺮﺣ ﺖﻣﺎﺻ

ﺮﺧﺁو تﱢﻮﺼُﻣ

( ﻞﻜﻟ فوﺮﺤﻟا

ﺖﻣاﻮﺼﻟا ﺔﻳﻮﻳﻼﻤﻟا

ﻦﻴﺘﻨﺛﻻا ﻦﻳﺮﺸﻌﻟاو

. ﻲﻐﺒﻨﻳو نأ

ﺮﻴﺸُﻳ رﺎﺒﺘﺧإ

ﻲﻠﻤﻋ

حﻮﺿﻮﻟ مﻼﻜﻟا

،يﻮﻳﻼﻤﻟا ﻰﻟإ

ىﺪﻣ ةدﻮﺠﻟا ﻲﺘﻟا

رﺪﻘﻳ ﺎﻬﻴﻠﻋ ءﻻﺆه

دﻻوﻷا

ﻲﻓ جﺎﺘﻧإ

؛مﻼﻜﻟا ﺎﻤّﻠﻜﻓ

ﺖﻌﻔﺗرا

،ﺔﺟرّﺪﻟا نﺎآ

ذ ﻚﻟ ﻞﻀﻓأ ﻲﻓ

حﻮﺿو ﻢﻬﻣﻼآ

.

نﺎآو رﺎﺴﻣ

ﻞﻤﻌﻟا ﻲﻟﺎّﺘﻟا

ﻮه ءﺎﺼﻘﺘﺳا مﺎﻈﻧ

فّﺮﻌﺘﻟا ﻰﻠﻋ

ﺔﻤﻴﻧﻮﻔﻟا

) تﻮﺼﻟا (

، مﺎﻈّﻨﻟا يﺬﻟا

ﺐﺳﺎﻨُﻴﺳ ارﺎﺒﺘﺧإ

ﺎّﻴﻠﻤﻋ حﻮﺿﻮﻟ

مﻼﻜﻟا يﻮﻳﻼﻤﻟا

.

ﺪﻗو ﺖﻣِﺪﺨُﺘﺳا

ُﺔﻜﺒﺸﻟا ﺔّﻴﺒﺼﻌﻟا

ﺔّﻴﻟﺎﻌﻔﺑ ﻢﻴﻤﺼﺘﻟ

ﻊﻳزﻮﺗ تﻼﻗﺎﻧ

ةﺰﻴﻤﻟا

ةدﻮﺟﻮﻤﻟا ﻲﻓ

تارﺎﺷإ مﻼﻜﻟا

ﻒﻴﻨﺼّﺘﻠﻟ .

ٌﻞﺧﺪﻣو

ٌﺪﻳﺪﺟ مﺪﺨﺘﺴﻳ

َةرﻮﺻ ﻒﻴﻃ

،مﻼﻜﻟا رﺎﺻ

ﻮه تﻻﺎﺧدﻹا ﻰﻟإ

ﺔﻜﺒﺸﻟا ﺔﻴﺒﺼﻌﻟا

تاذ ثﻼﺛ تﺎﻘﺒﻃ

) وأ

كاردﻹا دّﺪﻌﺘﻤﻟا

تﺎﻘﺒّﻄﻟا .(

ّنإو تﺎﻋﻮﻤﺠﻣ ةﺰﻴﻣ

،لﺎﺧدﻹا ﺪﻳﺪﺤﺘﻟ

تّﻮﺼﻟا

) ﺔﻤﻴﻧﻮﻔﻟا (

،ﻲآّﺬﻟا ﺖﻧﺎآ

ﻰﻠﻋ سﺎﺳأ ﺺﺋﺎﺼﺨﻟا

ﺔﻳﺮهﻮﺠﻟا ﻊﻃﺎﻘﻤﻠﻟ

ﺔﻴﻈﻔﻠﻟا

ﺔﻳﻮﻳﻼﻤﻟا ةزرﺎﺒﻟا

ﻲﻓ ةرﻮﺻ ﻴﻃ

ﻒ ةرﺎﺷﻹا ةذﻮﺧﺄﻤﻟا

ﻦﻣ مﻼﻜﻟا . ﺪﻗو ﺖﺠِﺘﻧُأ

رﻮُﺻ ﻒﻴّﻄﻟا

ﻦﻣ ةﺎﻔﺼِﻣ مﻼﻜﻟا

ﺔﻴﻣزراﻮﺨﻟا ﺮﺜآﻷا

؛ﻻﺎﻤﻌﺘﺳا ﺎﻬﻨﻣ

تاﺪﻋﺎﺴﻤﻟا ﺔﻴﻟاﺮﺘﺴﺒﺴﻟا

،ﺔﻳدّدﺮﺘﻟا

ُﺆّﺒﻨﺘﻟاو ﻲّﻄﺨﻟا

،ﺰّﻴﻤﻤﻟا ﻞﻳﻮﺤﺘﻟاو

ﻲﻔﻴّﻄﻟا

ﻲﺒﺴّﻨﻟا .

ﺪﻗو ﺖﺑّﺮُﺟ تﺎﻔّﻨﺼﻤﻟا

ﻰﻠﻋ ﻦﻴﺘﻨﺛا ﻦﻳﺮﺸﻋو

ﺔﻤﻴﻧﻮﻓ )

تﻮﺻ (

ﻦﻣ

(4)

iv

ﻖﻄﻨﻟا ﻳﻼﻤﻟا

يﻮ ﻩّﻮﻔَﺗ ﻪﺑ ناﺪﻟَو نﺎﺘﻨِﺑو

ﻦﻣ دﻻوﻷا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا

. ﺪﻗو نزوُو

َﺲﻴﻗو

ُءادأ مﺎﻈﻧ فّﺮﻌﺘﻟا

ﻰﻠﻋ تﺎﻤﻴﻧﻮﻔﻟا

) تاﻮﺻﻷا (

ﺔﻳﻮﻳﻼﻤﻟا ءادﺄﺑ

ﻊﻤﺘﺴﻤﻟا يﺮﺸﺒﻟا

.

ّنإ رّﻮﻄﺘﻟا ﺢﺟﺎّﻨﻟا

مﺎﻈﻨﻟ فّﺮﻌﺘﻟا

ﻰﻠﻋ ﺔﻤﻴﻧﻮﻔﻟا )

تﻮﺼﻟا (

ﻊﻔﻨﻳ ضﺮﻏﻷ ةﺪﻳﺪﻋ

ﺎﻬﻨﻣ ﻪّﻧأ ) : ا ( نﻮﻜﻴﺳ اﺪﺣاو

ﻦﻣ قﺮّﻄﻟا

ّوﻷا ﺔّﻴﻟ ﻒﱠﻇﻮُﺗ

ﺔﻴﻋﻮﺿﻮﻤﺑ نزﻮﻟ

حﻮﺿو مﻼآ

دﻻوﻷا

ّﻢّﺼﻟا

،ﻦﻴﻳﺰﻴﻟﺎﻤﻟا ﺎﻬﻨﻣو

ﻪّﻧأ ) ب (

ﻢِﻬﺴُﻴﺳ ﻲﻓ

ﻦﻴﺴﺤﺗ ﻢﻴﻴﻘﺗ

ةرادإو ﺞﻣﺎﻧﺮﺑ

ﻞّﺧﺪﺘﻟا ﺢﻟﺎﺼﻟ

دﻻوﻷا

ّﻢّﺼﻟا

ﻦﻴﻳﺰﻴﻟﺎﻤﻟا .

(5)

v

APPROVAL PAGE

The thesis of Zulkhairi Mohd. Yusof has been approved by the following:

___________________________________

Mohiuddin Ahmed Supervisor

___________________________________

Ramlah Hussain Co-supervisor

___________________________________

Abdul Wahab Abdul Rahman Internal Examiner

___________________________________

Ong Yew Soon External Examiner

___________________________________

Nasr Eldin Ibrahim Ahmad Chairman

(6)

vi

DECLARATION

I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.

Zulkhairi Mohd. Yusof

Signature : ________________________ Date : _____________________

(7)

vii

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH

Copyright © 2011 by Zulkhairi Mohd. Yusof. All rights reserved

MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN:

PHONEME RECOGNITION USING ARTIFICIAL NEURAL NETWORK No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below.

1. Any material contained in or derived from this unpublished research may only be used by others in their writing with due acknowledgement.

2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieval system and supply copies of this unpublished research if requested by other universities and research libraries.

Affirmed by Zulkhairi Mohd. Yusof

_______________________ _______________

Signature Date

(8)

viii

I would like to dedicate this work to my son, Muhammad Anas bin Zulkhairi, a deaf child from whom this work is inspired.

(9)

ix

ACKNOWLEDGEMENT

Given the scale of the task (or struggle?) that is the Ph.D. thesis, and the acknowledgment that none of us exists in a vacuum, many thanks are in order. First is to Assoc. Prof. Dr. Mohiuddin Ahmed, my advisor during my postgraduate years. I am grateful for the guidance he provided during this project and for fostering an environment at IIUM that allowed for intellectual exploration, personal growth, and (just as important) lots of personal advice. I can honestly say that this experience will be unmatched in my life. I am also greatly indebted to my other supervisor, Asst. Prof.

Dr. Ramlah Hussain who provided much assistance especially during my struggling in the early years of this work. By extension, I thank IIUM, UniKL-BMI, and Cued Speech Center for allowing me the use of many of their resources, without which this work would not have been possible.

I would also like to thank Cik Roslina Ahmad of the Cued Speech Center, with whom I worked closely on the development of speech intelligibility test component of this thesis. Her many ideas and suggestions are truly appreciated and her enthusiasm drove my efforts considerably. A number of Cued Speech Center parents and staff, both past and present, were also instrumental to this thesis work. These include Cikgu Aini, Cikgu Mazrah, Cikgu Zakiah, Cikgu Roy, En. Nadzmi, En. Azmi and the rest.

Of course I would be remiss if I did not acknowledge my parents. I thank them for their patience, for always believing in me, and for setting a high standard of excellence, with word matched in deed. To them I say, “jazakaLLAHU khoiran kathiran" Lastly, this work would not be possible without the emotional support I have received during these graduate years from my lovely wife, Tahiah.

(10)

x

TABLE OF CONTENTS

Abstract ... ii

Abstract in Arabic ... iii

Approval Page ... v

Declaration Page ... vi

Copyright Page ... vii

Dedication ... viii

Acknowledgement ... ix

Table of Contents ... x

List Of Tables ... xv

List Of Figures ... xvii

CHAPTER 1: ... 1

Introduction ... 1

Background ... 1

Speech Intelligibility Test for Deaf Children ... 3

Phoneme Recognition ... 6

Artificial Neural Network ... 8

Thesis layout ... 11

Definition of Terms ... 12

CHAPTER 2: ... 15

Speech Intelligibility Measure for Deaf Children ... 15

Introduction ... 15

The Deaf in Malaysia ... 15

Education for the deaf ... 18

Alternative education for the deaf ... 20

Cued Speech ... 21

The Success of Cued Speech ... 25

Cued Speech and Lip-reading ... 25

Cued Speech and Expressive/Receptive Language ... 26

Cued Speech and Reading ... 27

Cued Speech and Listening ... 27

Cued Speech and Sign Language ... 28

Cued Speech and Cochlear Implantation ... 28

Measuring the success of Cued Speech in Malay ... 29

Speech Intelligibility ... 30

Speech Intelligibility Measure ... 30

Interval Scaling Method ... 32

Mean Opinion Score ... 32

Attribute Estimation ... 33

Direct Magnitude Estimation ... 34

Identification Task Method ... 34

(11)

xi

Diagnostic Rhyme Test (DRT) ... 36

Modified Rhyme Test (MRT) ... 37

Cluster Identification Test (CLID) ... 38

Phonetically Balanced Word Lists (PBWL) ... 39

Standard Segmental Test (SST) ... 39

Sentence Level Tests ... 40

Comprehension tests ... 41

Summary ... 42

CHAPTER 3: ... 44

Front-end of Speech Recognition System ... 44

Introduction ... 44

Speech Recognition System ... 44

Computer-based speech recognition ... 46

Review of speech recognition researches ... 48

Speech Recognition Process ... 51

Feature Extraction Algorithms ... 53

Linear Prediction Cepstral Coefficients (LPCC) ... 53

Pre-emphasis and Hamming windowing ... 54

Linear Predictive Analysis ... 55

Cepstral Analysis ... 57

Perceptual Linear Prediction Coefficients (PLP) ... 59

Hamming windowing and FFT ... 60

Bark-scale Filter Bank ... 60

Equal-loudness curve ... 63

Intensity-loudness compression ... 64

IDFT and Linear Predictive Analysis ... 65

Cepstral Analysis ... 65

Mel Frequency Cepstral Coefficients (MFCC) ... 66

Pre-emphasis, Hamming windowing and FFT ... 67

Mel scale Filter Bank ... 67

Logarithmic compression ... 69

DCT ... 70

RASTA-PLP ... 70

Summary ... 72

CHAPTER 4: ... 73

Back-end of Speech Recognition System ... 73

Introduction ... 73

Back-end Speech Recognition System ... 73

Dynamic Time Warping (DTW) ... 73

Hidden Markov Model (HMM) ... 74

Artificial Neural Network (ANN) ... 76

Fundamentals of Neural Network ... 78

Processing units ... 78

Network architecture ... 80

Computing Algorithm ... 82

(12)

xii

Input and output encodings ... 83

Learning algorithms ... 84

Measuring the length of training ... 86

Terminating the training process ... 87

Properties of neural networks ... 89

Generalizability and pattern recognition ... 89

Scaling ... 90

Plasticity and incremental learning ... 91

Distributed representation ... 91

Backpropagation ... 92

Back-propagation training algorithm ... 95

Back-propagation speed learning ... 96

Hyperbolic tangent ... 96

Momentum ... 97

Adaptive learning rate ... 97

Summary ... 98

CHAPTER 5: ... 99

Malay Speech Intelligibility Test (MSIT) ... 99

Introduction ... 99

Research Design ... 99

Subjects ... 100

Standard reference of intelligibility measure... 101

Sentence Level Test (Helen-questions) ... 103

Test Materials ... 104

Listeners ... 105

Procedures ... 105

Speech Recording ... 105

Playback for Listeners ... 106

Scoring ... 107

Results and Analysis ... 108

Word Level Test (Malay Modified Rhyme Test) ... 111

Test Materials ... 112

Listeners ... 113

Procedures ... 113

Speech Recording ... 113

Playback for Listeners ... 114

Scoring ... 114

Results and Analysis ... 115

Segmental Level Test (Nonsense Word Test) ... 118

Subjects ... 118

Test Materials ... 118

Listeners ... 119

Procedures ... 119

Speech Recording ... 120

Playback for Listeners ... 121

Data Collection ... 123

Results and Analysis ... 124

(13)

xiii

Summary ... 131

CHAPTER 6: ... 134

Phoneme Classification ... 134

Introduction ... 134

Recognition of Phonemes ... 135

Spectrogram as speech features ... 136

Introduction to spectrograms ... 137

Generating spectrogram ... 139

Sampling ... 139

Quantization ... 139

Spectogram creation methods ... 140

Spectograms differences ... 141

Phonemes ... 142

Malay phoneme ... 142

Malay consonant phonemes ... 142

Research Design ... 144

Design Approach ... 146

The process ... 146

Spectral analysis ... 148

Matlab implementation of spectral analysis ... 149

Waveform and spectrogram. ... 150

Power spectrum. ... 151

Critical band spectrum. ... 152

Loudness equalization and cube compression. ... 153

LPC spectral analysis. ... 153

MFC spectral analysis. ... 155

PLP spectral analysis. ... 157

RASTA-PLP spectral analysis. ... 158

Locating the features ... 159

Phoneme segmentation ... 160

Speech tokens ... 166

Subjects ... 168

Phoneme classification ... 171

Designing the neural network ... 173

Network architecture. ... 174

Hidden Units. ... 175

Transfer Functions. ... 175

Training algorithm ... 177

Training procedure. ... 177

Experimental results on phoneme classification ... 179

Extended experiment : 10 subjects ... 184

Summary ... 186

Achievement ... 187

Phoneme detection based on image spectrum feature ... 188

Spectrum image can provides detail analysis of failure ... 189

(14)

xiv

CHAPTER 7: ... 191

Conclusion ... 191

Conclusion ... 191

Further works ... 196

BIBLIOGRAPHY ... 199

APPENDIX I: THE SUBJECTS ... 209

APPENDIX II: SENTENCE LEVEL TEST QUESTIONS ... 211

APPENDIX III: WORD LEVEL TEST QUESTIONS ... 214

APPENDIX IV: NONSENSE WORD TEST RESULTS ... 215

APPENDIX V: CUED SPEECH IN MALAY (CSM) ... 219

APPENDIX VI: THE INTERNATIONAL PHONETIC ALPHABET ... 220

APPENDIX VII: SELECTED MATLAB FUNCTIONS FOR SITS ... 221

(15)

xv

LIST OF TABLES

Table No. Page No.

2.1 Mean Opinion Score (MOS) 33

2.2 Example of Attribute Estimation. 33

2.3 The DRT characteristics. 36

2.4 Examples of the response sets in MRT. 37

4.1 Neural Network notation terms and description. 83 5.1 Category rating of deaf children speech in the Cued Speech Center. 101 5.2 Ratings of 20 deaf children in Cued Speech Center as perceived by

experienced teachers and speech therapist at the center. 102 5.3 Comparison of ratings of 20 deaf children in Cued Speech Center

between the experienced teachers and speech therapist vs. Helen’s rating. 109

5.4 Examples of Malay MRT test sentences 112

5.5 Phonemes distribution in 200 words list of Malay Modified Rhyme Test 115 5.6 Phonemes distribution in an article in Malay newspaper 117 5.7 Comparison of ratings of 20 deaf children in Cued Speech Center

between the experienced teachers and speech therapist vs Nonsense

Word Test. 126

5.8 Details of the mean and standard deviation of the 30 children’s Nonsense

Word Test scores. 128

5.9 Levene’s Test for Equality of Variances for 30 children’s NWT scores. 129

6.1 Recognition Results for Subject 1 180

6.2 Recognition Results for Subject 2 180

6.3 Recognition Results for Subject 3 181

6.4 Recognition Results for Subject 4 181

6.5 Individual result on recognition percentage for classifying 22 Malay

phonemes 182

(16)

xvi

6.6 Recognition percentage for classifying 22 Malay phonemes based on

gender 182 6.7 Confusion matrix; 22 Malay syllables using MFC spectrum for Subject 1 183

6.8 Confusion matrix; 22 Malay syllables using PLP spectrum for Subject 1 183 6.9 Confusion matrix; 22 Malay syllables using RASTA-PLP spectrum for

Subject 1 184

6.10 Recognition percentage for classifying 22 Malay phonemes for 10

subjects 185

(17)

xvii

LIST OF FIGURES

Figure No. Page No.

1.1 Structure of Phoneme Recognizer 6

2.1 Pertuturan Kiu Bahasa Melayu (PKBM) or Cued Speech in Malay

(CSM). 22 2.2 Reception of Key Words in sentences (Nicholls 1979). 26

3.1 Basic block diagram illustrating the major operations in speech

recognition... 51

3.2 Block diagram of LPCC algorithm. 54

3.3 Block diagram of PLP algorithm. 60

3.4 An example of centre frequency distribution in a Bark-scale filter bank

(Nyquist frequency = 5kHz (16.9 Barks)). 61

3.5 Bark scale filter bank (Sampling frequency = 16kHz). 62

3.6 Block diagram of MFCC algorithm 66

3.7 Mel scale filter bank (sampling frequency = 16 kHz). 69

3.8 RASTA-PLP algorithm 71

4.1 A simple Hidden Markov Model, with two states and two output

symbols, A and B. 75

4.2 Architecture of a typical neural network. 79

4.3 Neural network topologies: (a) unstructured, (b) layered, (c) recurrent,

(d) modular (Tebelskis, 1995). 82

4.4 A three-layer back-propagation neural network. 93 5.1 MSIT Comprehension Test: A – Interface for filling in subjects’

particulars. B – Recording page 106

5.2 MSIT Comprehension Test: C – Interface for filling in listeners’ details.

D – Assessment page. 107

5.3 The scoring page for Helen-questions test. 108 5.4 Scores on Helen-questions in percent for Cued Speech Children 109

(18)

xviii

5.5 Typical article in Malay newspaper 116

5.6 Details of the subject are recorded prior to the session. 120 5.7 The recording interface includes the speech material in large print, 2

seconds timer and the waveform of the speech signals, the red ‘repeat’

button and the green ‘continue’ button. 121

5.8 Details of the assessor are recorded prior to the session. 122 5.9 The assessment interface includes the set of 22 syllables frame and the

waveform of the speech signals. 123

5.10 Nonsense Word Test scores of 20 prelingually deaf children. 125 5.11 Nonsense Word Test scores of 10 normal hearing children. 125 5.12 Sample conversion matrix of 30 children’s Nonsense Word Test taken

from Appendix D. 130

5.13 MSIT scores for 20 deaf children based on the Nonsense Word Test. 132

6.1 Spectrogram of the word "compute" 138

6.2 Block diagram illustrating the three major processes for phoneme

recognition. 146 6.3 An example of a speech waveform of Malay syllable /ba/ in time

domain. 148 6.4 Spectrogram of speech signals in 6.3. The color of each point shows the

amplitude of a specific frequency at a certain time. 148 6.5 Examples MFC spectrum of syllables /nya/, /fa/, /ba/ and /pa/ 149 6.6 Examples MFC spectrum of syllable /ba/ repeated by the same subject

three times. 149

6.7 Waveform (top) and Spectrogram (bottom) generated from the Matlab

code. 151

6.8 Power spectrogram 152

6.9 Critical Band spectrogram 152

6.10 Output spectrogram after loudness equalization and cube

compression 153 6.11 LPC cepstrum (top) and LPC spectrum (bottom) 154

6.12 MFC cepstrum (top), MFC power spectrum (middle) and MFC auditory

spectrum (bottom) 156

(19)

xix

6.13 PLP cepstrum (top) and PLP spectrum (bottom) 158 6.14 RASTA-PLP cepstrum (top) and RASTA-PLP spectrum (bottom) 159 6.15 MFC spectrum of Malay syllable /ba/ using short /a/ (left) and long /a/

(right) 159

6.16 Speech token location 160

6.17 The energy density for LPC, MFC, PLP and RASTA-PLP cepstrum of

syllable /sya/. 162

6.18 Phoneme segmentation process based on 0th order MFC cepstral

coefficients 163 6.19 Phoneme segmentation error due to unnoticed existence of small valley

after the central peak. 163

6.20 Speech representations of Malay syllable /ba/ from its LPC, MFC, PLP

and RASTA-PLP spectrum 165

6.21 40 ms speech tokens of Malay syllable /ba/ from its LPC, MFC, PLP and

RASTA-PLP spectrum 166

6.22 22 speech tokens representation for Malay syllables using LPC spectrum

(left) and MFC spectrum (right) 167

6.23 22 speech tokens representation for Malay syllables using PLP spectrum

(left) and RASTA-PLP spectrum (right) 168

6.24 Speech tokens for Malay syllable /ba/ by four Malay children speakers

using MFC, PLP and RASTA-PLP spectrum 169

6.25 Speech tokens for Malay syllable /ba/ by four Malay children speakers

repeated three times using MFC spectrum 170

6.26 Speech token image become inputs to the MLP neural network 172 6.27 Speech Recognition Process using speech spectrum as inputs to MLP

Neural Network. 173

6.28 Several popular nonlinear transfer functions (Tebelskis, 1995) 176 6.29 Number of consonants vs. the recognition percentage using PLP-based

spectrum image method for 10 subjects 185

6.30 Phonetic features of consonants 188

6.31 Speech spectrum for token /ba/ showing the phonetic features for

consonant ‘b’; voiced, plosive and bilabial 189

(20)

xx

6.32 Speech spectrum for token /ba/ for 10 subjects showing the similar

phonetic features in consonant ‘b’; voiced, plosive and bilabial 189 6.33 (a) normal syllable /ba/ and (b) erratic syllable /ba/ without the voiced

phonetic feature 190

(21)

1

CHAPTER ONE INTRODUCTION

BACKGROUND

Profoundly deaf children have been shown able to attain intelligible speech. They can be taught to talk although their levels of speech skills may have been historically poor (Perigoe & Le Blanc, 1994). Reduced intelligibility severely compromises communication and social interaction for affected individuals.

Throughout the world, very little public assistance is channeled towards preventing young deaf children from becoming dumb adults. This is because the main handicap of these children is not only invisible but also badly misunderstood. Among the handicapped, deaf children currently receive the most controversial, and probably the least effective help from the specialists responsible for their welfare and development.

It should be more widely appreciated that the most serious handicap which deafness inflicts upon children is not the lack of hearing, but rather the lack of verbal (spoken/written) language development - a consequence of inability to understand speech, which prevents deaf children from picking up verbal language in the same natural way as other children do.

This lack of verbal language development in deaf children retards their intellectual development, impedes their education, restricts their avenues for gainful employment, confines their membership in a social minority, deprives them of their cultural heritage, and prevents them from attaining a normal quality of life. Almost all

(22)

2

deaf (as distinct from partially hearing) children suffer this sad fate, even in the world's most highly developed countries.

Although speech deficiencies in the deaf are quite difficult to overcome, learning to produce intelligible speech is not an impossible task (Perigoe & Le Blanc, 1994). If a deaf child can receive a great deal of individual speech instruction from highly skilled teachers, he may learn speech that is reasonably intelligible and very useful. Unfortunately, few deaf children reach this level of speech production, in part because of the limited amount of their time that is available for speech training, and in part because of the shortage of highly proficient teachers and speech therapist. If more efficient means could be found for alleviating the deficient speech communication of the deaf, their educational possibilities would be greatly improved, as would their ability to make their way in a hearing world.

Until the late 1960s, the acquisition of verbal language – as distinct form sign language – had always represented an almost insuperable task for the deaf. However, over the last decades, researchers have demonstrated that even totally deaf children can be helped to acquire verbal languages visually with almost the same ease as they can pick up sign languages. In 1967, Dr. R. Orin Cornett demonstrated a means he had devised which was capable of helping the reception of spoken language for the deaf. It was named ‘Cued Speech’ because hand cues are used simultaneously with normal speech to eliminate ambiguity from lip-reading. Many researches have demonstrated that deaf children using Cued Speech out-performed the majority of children using sign language in verbal language skills (Peterson, 1991). The effectiveness of Cued Speech has been established in many different areas, including improving their speech intelligibility. Berendt et al. (1990) demonstrated that children who were profoundly deaf and had used Cued Speech for more than two years scored better than 92% in

(23)

3

receptive language test using the Rhode Island Test of Language Structure (RITLS).

Cued Speech children also scored as good as normal hearing children in the expressive language test using the Developmental Sentence Score (DSS).

Cued Speech was adapted for use with the Malay language in September, 1982, and was given the name Cued Speech in Malay (CSM) or Pertuturan Kiu Bahasa Melayu (PKBM) in Malay. An on-going project initiated by the National Society for the Deaf (NSD) has clearly shown that deaf Malaysian children can be taught how to speak effectively in Malay through the use of CSM (Tan, 1997). The school which was setup by NSD, namely Cued Speech Centre (CSC), focuses on deaf children to use their voice to communicate with other normal hearing people. According to Tan, Cued Speech in Malay (CSM) is at present the only practicable means through which a successful early development of the Malay language in deaf children is possible, regardless of their extent of hearing loss. The children are also able to develop the same age-level reading and writing skills as their hearing peers. The most noticeable feature of CSM is that, whereas children who sign rarely attempt to speak each word that they sign or fingerspell, CSM children invariably vocalize every syllable that they cue.

SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN

It is becoming increasingly difficult to ignore the contribution of CSM in helping deaf Malaysian children to speak. These children have shown the ability to speak up to a certain level of intelligibility. Although these phenomena of diversity in speech clarity or speech intelligibility exist in the school, there is no systematic approach to classify these children into speech intelligibility ranking and from then on develop these skills up to a certain standard accepted by the normal hearing person. There are various tests

(24)

4

which can be used to measure proficiency in the English language: the Maryland Syntax Evaluation Instrument (MSEI), the Expressive One Word Picture Vocabulary Test (EOWPVT), the Rhode Island Test of Language Structure (RITLS), the Developmental Sentence Score (DSS) and several others. However, far too little attention has been paid to such standard objective speech intelligibility test for use within the deaf Malaysian children training programme. The existing speech perceptual test procedures normally conducted by speech pathologist, have two main shortcomings, namely a subjective evaluation and a capacity to be recognized only by trained professionals. This indicates a need to understand the various methods available for speech intelligibility test and to investigate the appropriate speech intelligibility test system accordingly for deaf Malaysian children.

One aim of this research is to improve on the results obtained by previous work on speech intelligibility measures for deaf Malaysian children. In Malaysia, most software used for diagnosis and management of deaf children are based on the English language, which is unsuitable, given the great differences in the two languages. In non-phonetic language such as English, spelling often has little relation to actual pronunciation. On the other hand, Malay language is spoken just as it is written most of the time. There are standard sounds to all pronunciation of its written consonants and vowels.

In Malay language, most of the spoken syllables are in CV (consonant-vowel) or CVC (consonant-vowel-consonant) form, which makes the synthesis a bit easier than with other languages. The 22 Malay consonants (b, c, d, f, g, h, j, k, l, m, n, p, r, s, t, v, w, y, z, sy, ng, ny) occur once with each Malay vowel (a, o, e, è, i, u) to form a syllable, CV. It is expected that if a child is able to pronounce all the consonants correctly than it will indicate that his or her speech intelligibility is excellent. On the

Rujukan

DOKUMEN BERKAITAN

Usability Of An Interactive Board Game As Therapy Tool In Children With Speech Sound Disorders.. (Penggunaan Papan Permainan Interaktif Sebagai Alat Terapi Di Kalangan

In conclusion, a Mandarin speech perception test which consists of digitally recorded stimuli and pictures has been developed for Mandarin-speaking pre-school children in

Many of the existing works did not consider voice quality as essential features for detecting speech intelligibility of individual with Cerebral Palsy and hearing impairment.

Language is vital in expressing concern as it allows one to explain their feelings, ideas and behaviour. Teachers and caregivers are concerned about the poor academic performance of

In the USA, most of the deaf children are now educated in mainstream schools and training teachers for sign bilingual education is no longer experimental but

After discussion with SLPs in the Klinik Audiologi dan Sains Pertuturan in Fakulti Sains Kesihatan, the SLPs proposed an app which can satisfy the language development and the

This result, taken on its own, suggests that if less proficient profoundly deaf Australian readers were to improve their phonological speech-based word coding

These skills provide the foundation for language development and if a child has not acquired any one of these skills or has difficulty using any of these

SPPT showed relatively low test-retest and inter-rater Spearman correlation coefficients (r = 0.68 for correct syllable pattern response and r = 0.38 for correct item response),

In this study, we examine the use of Indonesian negation by Indonesian young children in Jakarta who acquire at least two Indonesian varieties: the standard Bahasa Indonesia (BI)

i: What types of speech acts do the President Rodrigo Duterte’s inaugural speech involve ? ii: What do the speech acts in President Rodrigo Duterte’s inaugural speech reflect.. iii:

Title of Thesis: Fuzzy Petri Nets as a classification method for automatic speech intelligibility detection of children with speech impairments.. Field of Study:

Hence, the main problem with acoustic modeling is how to build robust large vocabulary continuous speech recognition (LVCSR) system for a new target language (small

Moving to the second research question ‘What are the common characteristics as compared to the description of ‘child directed speech’ based on monolingual English speakers in the

Penemuan ini adalah selaras dengan cadangan Watson, Watson dan Wilson (1999) yang menyarankan kepentingan input dan latihan dalam pemerolehan bahasa. Bentuk tangan

صلختسلما يهفشلا يظفللا لصاوتلاو ةماع لصاوتلا تاراهم ترصح تيلا تاساردلا ةردن في ةساردلا ةلكشم نمكت منت نكيم تيلا ،ةصاخ ةيمنت في مهاست تيلا

(2014) ‘Cache based recurrent neural network language model inference for first pass speech recognition’, IEEE International Conference in Acoustics, Speech and Signal

These room models may not only have a short distance between the sound source and its surrounded walls but also sloped ceilings above the sound source.It can be generally

How does one know the speech intelligibility rating of a subject so that the MSIT score can be compared to (validate)? If a subject is rated poor on his or

As mentioned elsewhere in this paper, literature for children in the Malay language began as an oral tradition _ narratives related to children for purposes of teaching moral

This project proposed a speech separation pipeline that leverages the availability of visual information to perform speech isolation for visible speakers using

The STOI values were found to be lower in Video 1 and 2 because the input sounds were MIDI tunes, as the STOI is designed to measure human speech intelligibility, perhaps

Contrary to fixed inventory approach, which only has one type of a pre-recorded segment is stored inside the corpus, the unit present in speech corpus of non-uniform unit