MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN:
PHONEME RECOGNITION USING ARTIFICIAL NEURAL NETWORK
BY
ZULKHAIRI MOHD. YUSOF
A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in
Management Information System
Kulliyyah of
Information Communication and Technology International Islamic University
Malaysia
FEBRUARY 2011
ii ABSTRACT
It is estimated that about 2000 deaf children are born each year in Malaysia. Most deaf Malaysian children have very poor speech intelligibility. Reduced intelligibility severely compromises communication and social interaction for affected individuals.
Although speech deficiencies in the deaf are quite difficult to overcome, learning to produce intelligible speech is not an impossible task. Studies have shown that deaf children receiving Cued Speech can acquire reasonable speech intelligibility, surpassing the majority of signing children in verbal language skills. A reliable measure of speech intelligibility for deaf children is required for several reasons: to provide an index of the severity of speech disorder, to assist in treatment decisions, and to quantify changes which may result from intervention or treatment. This thesis investigates the approach to measure speech intelligibility of deaf Malaysian children.
The research discussed in this work starts with the experiments on a practical Malay Speech Intelligibility Test (MSIT), suitable for use within deaf Malaysian children training programme. In this study, speech intelligibility of deaf children is measured through the ability to say simple nonsense syllables (consisting of a consonant and a vowel) for all 22 Malay consonants. The MSIT score should indicate how well these children can produce speech; the higher the score, the better their speech intelligibility.
The next course of action was to investigate phoneme recognition system that will suit MSIT. Artificial neural network was utilized to effectively model the distribution of feature vectors present in speech signals for classification. A novel approach using speech spectrum image becomes the inputs to a three-layer MLP (Multi-layer Perceptron) neural network. The input feature sets for the intelligent phoneme identification were based on the intrinsic characteristics of Malay syllables shown in the captured speech signal spectrum image. The spectrum images were produced from widely used speech filter algorithm; Mel-frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP) and Relative Spectral Transform - Perceptual Linear Prediction (RASTA-PLP). The classifiers have been tested for twenty-two Malay phonemes utterances from two males and two females’ children speaker. The performance of the system for recognition of Malay phonemes is measured and compared with the performance of human listener. The successful development of the phoneme recognition system serves several purposes: (a) it will be one of the first methods employed to objectively measure speech intelligibility of deaf Malaysian children, and (b) it will contribute to better assessment and management of intervention programme for deaf Malaysian children.
iii
ﺺّﺨﻠﻣ ﺚﺤﺒﻟا
ﺮﻴﺸُﺗ تاﺮﻳﺪﻘﺘﻟا ﻰﻟإ
ّنأ ﻮﺤﻧ ﻦﻴﻔﻟأ ﻞﻔﻃ
ّﻢﺻأ ﺪﻟﻮﻳ ﺎّﻳﻮﻨﺳ ﻲﻓ
ﺎﻳﺰﻴﻟﺎﻣ .
ﺪﺟﻮﻳو
ىﺪﻟ ﻢﻈﻌﻣ لﺎﻔﻃﻷا
ّﻢﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا
ُمﺪﻋ حﻮﺿﻮﻟا ﻲﻓ
،مﻼﻜﻟا ﺊّﻴﺴﻟا
ﺔﻳﺎﻐﻠﻟ .
حﻮﺿﻮﻟاو ﺺﻗﺎﻨﻟا
ضّﻮﻘﻳ ةّﺪﺸﺑ
َلﺎﺼﺗﻻا
َﻞﻋﺎﻔﺘﻟاو ﻲﻋﺎﻤﺘﺟﻻا
داﺮﻓﻸﻟ
ﻦﻳرّﺮﻀﺘﻤﻟا .
ﻰﻠﻋو ﻢﻏّﺮﻟا
ﻦﻣ
ّنأ ا رﻮﺼﻘﻟ ﻲﻓ
مﻼﻜﻟا ﻦﻴﺑ
ّﻢُﺼﻟا ﺐﻌﺼﻳ
ﺐّﻠﻐﺘﻟا
،ﻪﻴﻠﻋ
ّنﺈﻓ ﻢّﻠﻌﺗ ﺔﻴﻔﻴآ جﺎﺘﻧإ مﻼﻜﻟا ﺢﺿاﻮﻟا
ﺲﻴﻟ ﺔّﻤﻬﻣ ﺔﻠﻴﺤﺘﺴﻣ
. ﺪﻗو
تﺮﻬﻇأ تﺎﺳارّﺪﻟا
ﻪّﻧأ ﻦﻜﻤُﻳ دﻻوﻸﻟ
ّﻢﺼﻟا ﻦﻳﺬﻟا
نﻮّﻘﻠﺘﻳ مﻼﻜﻟا
،ﻦﻴﻘﻠّﺘﻟﺎﺑ
ُﺐﺴآ رﺪﻗ
ﻻ سﺄﺑ ﻪﺑ ﻦﻣ حﻮﺿﻮﻟا ﻲﻓ
،مﻼﻜﻟا ﺎﻣ
زوﺎﺠﺘﻳ رﺪﻗ
ﺔﻴﺒﻟﺎﻏ دﻻوﻷا
ﻤّﻠﻜﺘﻤﻟا ﻦﻴ
ةرﺎﺷﻹﺎﺑ ﻲﻓ
تارﺎﻬﻤﻟا ﺔﻳﻮﻐﻠﻟا
ﺔﻴﻈﻔﻠﻟا .
ّنإو نزﻮﻟا قﻮﺛﻮﻤﻟا
ﻪﺑ
حﻮﺿﻮﻟ مﻼآ
دﻻوﻷا
ّﻢُﺼﻟا
ٌبﻮﻠﻄﻣ ةّﺪﻌﻟ
بﺎﺒﺳأ ﺎﻬﻨﻣ
: ﺮﻴﻓﻮﺗ ﺮّﺷﺆﻣ
ةّﺪﺷ
باﺮﻄﺿا
؛مﻼﻜﻟا ةﺪﻋﺎﺴﻣ
ﻰﻠﻋ ذﺎﺨﺗا تاراﺮﻗ
؛جﻼﻌﻟا ﺪﻳﺪﺤﺗو
تاﺮﻴﻴﻐﺘﻟا
ﻲﺘﻟا ﺪﻗ ﻢﺠﻨﺗ ﻦﻋ ﻞّﺧﺪﺘﻟا جﻼﻌﻟاو
. ﺪﻗو ﺖﺳرد ﻩﺬه
ﺔﻟﺎﺳﺮﻟا
َﻞﺧﺪﻤﻟا ﻰﻟإ
نزو
حﻮﺿو مﻼآ
دﻻوﻷا
ّﻢُﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا
. أﺪﺒﻓ شﺎﻘﻨﻟا ﻲﻓ
اﺬه ﺚﺤﺒﻟا ءاﺮﺟﺈﺑ
برﺎﺠﺗ ﻰﻠﻋ
ٍرﺎﺒﺘﺧا ﱟﻲﻠﻤﻋ
حﻮﺿﻮﻟ مﻼﻜﻟا
،يﻮﻳﻼﻤﻟا ﺢﻠﺼﻳ
ماﺪﺨﺘﺳﻼﻟ ﻲﻓ
ﺞﻣﺎﻧﺮﺑ ﺐﻳرﺪﺗ
دﻻوﻷا
ّﻢُﺼﻟا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا
. ﺪﻗو
ّﻢﺗ ﻲﻓ ﻩﺬه
،ﺔﺳارﺪﻟا
ُنزو
حﻮﺿو مﻼآ
دﻻوﻷا
،ّﻢﺼﻟا ﻦﻋ
ﻖﻳﺮﻃ ةرﺪﻘﻟا
ﻰﻠﻋ ﻖﻄﻨﻟا ﻊﻃﺎﻘﻤﺑ
ﺔﻴﺗﻮﺻ
ﺔﻴﺋاﺮه ﺔﻄﻴﺴﺑ
) نّﻮﻜﺘﺗ ﻦﻣ
فﺮﺣ ﺖﻣﺎﺻ
ﺮﺧﺁو تﱢﻮﺼُﻣ
( ﻞﻜﻟ فوﺮﺤﻟا
ﺖﻣاﻮﺼﻟا ﺔﻳﻮﻳﻼﻤﻟا
ﻦﻴﺘﻨﺛﻻا ﻦﻳﺮﺸﻌﻟاو
. ﻲﻐﺒﻨﻳو نأ
ﺮﻴﺸُﻳ رﺎﺒﺘﺧإ
ﻲﻠﻤﻋ
حﻮﺿﻮﻟ مﻼﻜﻟا
،يﻮﻳﻼﻤﻟا ﻰﻟإ
ىﺪﻣ ةدﻮﺠﻟا ﻲﺘﻟا
رﺪﻘﻳ ﺎﻬﻴﻠﻋ ءﻻﺆه
دﻻوﻷا
ﻲﻓ جﺎﺘﻧإ
؛مﻼﻜﻟا ﺎﻤّﻠﻜﻓ
ﺖﻌﻔﺗرا
،ﺔﺟرّﺪﻟا نﺎآ
ذ ﻚﻟ ﻞﻀﻓأ ﻲﻓ
حﻮﺿو ﻢﻬﻣﻼآ
.
نﺎآو رﺎﺴﻣ
ﻞﻤﻌﻟا ﻲﻟﺎّﺘﻟا
ﻮه ءﺎﺼﻘﺘﺳا مﺎﻈﻧ
فّﺮﻌﺘﻟا ﻰﻠﻋ
ﺔﻤﻴﻧﻮﻔﻟا
) تﻮﺼﻟا (
، مﺎﻈّﻨﻟا يﺬﻟا
ﺐﺳﺎﻨُﻴﺳ ارﺎﺒﺘﺧإ
ﺎّﻴﻠﻤﻋ حﻮﺿﻮﻟ
مﻼﻜﻟا يﻮﻳﻼﻤﻟا
.
ﺪﻗو ﺖﻣِﺪﺨُﺘﺳا
ُﺔﻜﺒﺸﻟا ﺔّﻴﺒﺼﻌﻟا
ﺔّﻴﻟﺎﻌﻔﺑ ﻢﻴﻤﺼﺘﻟ
ﻊﻳزﻮﺗ تﻼﻗﺎﻧ
ةﺰﻴﻤﻟا
ةدﻮﺟﻮﻤﻟا ﻲﻓ
تارﺎﺷإ مﻼﻜﻟا
ﻒﻴﻨﺼّﺘﻠﻟ .
ٌﻞﺧﺪﻣو
ٌﺪﻳﺪﺟ مﺪﺨﺘﺴﻳ
َةرﻮﺻ ﻒﻴﻃ
،مﻼﻜﻟا رﺎﺻ
ﻮه تﻻﺎﺧدﻹا ﻰﻟإ
ﺔﻜﺒﺸﻟا ﺔﻴﺒﺼﻌﻟا
تاذ ثﻼﺛ تﺎﻘﺒﻃ
) وأ
كاردﻹا دّﺪﻌﺘﻤﻟا
تﺎﻘﺒّﻄﻟا .(
ّنإو تﺎﻋﻮﻤﺠﻣ ةﺰﻴﻣ
،لﺎﺧدﻹا ﺪﻳﺪﺤﺘﻟ
تّﻮﺼﻟا
) ﺔﻤﻴﻧﻮﻔﻟا (
،ﻲآّﺬﻟا ﺖﻧﺎآ
ﻰﻠﻋ سﺎﺳأ ﺺﺋﺎﺼﺨﻟا
ﺔﻳﺮهﻮﺠﻟا ﻊﻃﺎﻘﻤﻠﻟ
ﺔﻴﻈﻔﻠﻟا
ﺔﻳﻮﻳﻼﻤﻟا ةزرﺎﺒﻟا
ﻲﻓ ةرﻮﺻ ﻴﻃ
ﻒ ةرﺎﺷﻹا ةذﻮﺧﺄﻤﻟا
ﻦﻣ مﻼﻜﻟا . ﺪﻗو ﺖﺠِﺘﻧُأ
رﻮُﺻ ﻒﻴّﻄﻟا
ﻦﻣ ةﺎﻔﺼِﻣ مﻼﻜﻟا
ﺔﻴﻣزراﻮﺨﻟا ﺮﺜآﻷا
؛ﻻﺎﻤﻌﺘﺳا ﺎﻬﻨﻣ
تاﺪﻋﺎﺴﻤﻟا ﺔﻴﻟاﺮﺘﺴﺒﺴﻟا
،ﺔﻳدّدﺮﺘﻟا
ُﺆّﺒﻨﺘﻟاو ﻲّﻄﺨﻟا
،ﺰّﻴﻤﻤﻟا ﻞﻳﻮﺤﺘﻟاو
ﻲﻔﻴّﻄﻟا
ﻲﺒﺴّﻨﻟا .
ﺪﻗو ﺖﺑّﺮُﺟ تﺎﻔّﻨﺼﻤﻟا
ﻰﻠﻋ ﻦﻴﺘﻨﺛا ﻦﻳﺮﺸﻋو
ﺔﻤﻴﻧﻮﻓ )
تﻮﺻ (
ﻦﻣ
iv
ﻖﻄﻨﻟا ﻳﻼﻤﻟا
يﻮ ﻩّﻮﻔَﺗ ﻪﺑ ناﺪﻟَو نﺎﺘﻨِﺑو
ﻦﻣ دﻻوﻷا ﻦﻴﻳﺰﻴﻟﺎﻤﻟا
. ﺪﻗو نزوُو
َﺲﻴﻗو
ُءادأ مﺎﻈﻧ فّﺮﻌﺘﻟا
ﻰﻠﻋ تﺎﻤﻴﻧﻮﻔﻟا
) تاﻮﺻﻷا (
ﺔﻳﻮﻳﻼﻤﻟا ءادﺄﺑ
ﻊﻤﺘﺴﻤﻟا يﺮﺸﺒﻟا
.
ّنإ رّﻮﻄﺘﻟا ﺢﺟﺎّﻨﻟا
مﺎﻈﻨﻟ فّﺮﻌﺘﻟا
ﻰﻠﻋ ﺔﻤﻴﻧﻮﻔﻟا )
تﻮﺼﻟا (
ﻊﻔﻨﻳ ضﺮﻏﻷ ةﺪﻳﺪﻋ
ﺎﻬﻨﻣ ﻪّﻧأ ) : ا ( نﻮﻜﻴﺳ اﺪﺣاو
ﻦﻣ قﺮّﻄﻟا
ّوﻷا ﺔّﻴﻟ ﻒﱠﻇﻮُﺗ
ﺔﻴﻋﻮﺿﻮﻤﺑ نزﻮﻟ
حﻮﺿو مﻼآ
دﻻوﻷا
ّﻢّﺼﻟا
،ﻦﻴﻳﺰﻴﻟﺎﻤﻟا ﺎﻬﻨﻣو
ﻪّﻧأ ) ب (
ﻢِﻬﺴُﻴﺳ ﻲﻓ
ﻦﻴﺴﺤﺗ ﻢﻴﻴﻘﺗ
ةرادإو ﺞﻣﺎﻧﺮﺑ
ﻞّﺧﺪﺘﻟا ﺢﻟﺎﺼﻟ
دﻻوﻷا
ّﻢّﺼﻟا
ﻦﻴﻳﺰﻴﻟﺎﻤﻟا .
v
APPROVAL PAGE
The thesis of Zulkhairi Mohd. Yusof has been approved by the following:
___________________________________
Mohiuddin Ahmed Supervisor
___________________________________
Ramlah Hussain Co-supervisor
___________________________________
Abdul Wahab Abdul Rahman Internal Examiner
___________________________________
Ong Yew Soon External Examiner
___________________________________
Nasr Eldin Ibrahim Ahmad Chairman
vi
DECLARATION
I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.
Zulkhairi Mohd. Yusof
Signature : ________________________ Date : _____________________
vii
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH
Copyright © 2011 by Zulkhairi Mohd. Yusof. All rights reserved
MALAY SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN:
PHONEME RECOGNITION USING ARTIFICIAL NEURAL NETWORK No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below.
1. Any material contained in or derived from this unpublished research may only be used by others in their writing with due acknowledgement.
2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.
3. The IIUM library will have the right to make, store in a retrieval system and supply copies of this unpublished research if requested by other universities and research libraries.
Affirmed by Zulkhairi Mohd. Yusof
_______________________ _______________
Signature Date
viii
I would like to dedicate this work to my son, Muhammad Anas bin Zulkhairi, a deaf child from whom this work is inspired.
ix
ACKNOWLEDGEMENT
Given the scale of the task (or struggle?) that is the Ph.D. thesis, and the acknowledgment that none of us exists in a vacuum, many thanks are in order. First is to Assoc. Prof. Dr. Mohiuddin Ahmed, my advisor during my postgraduate years. I am grateful for the guidance he provided during this project and for fostering an environment at IIUM that allowed for intellectual exploration, personal growth, and (just as important) lots of personal advice. I can honestly say that this experience will be unmatched in my life. I am also greatly indebted to my other supervisor, Asst. Prof.
Dr. Ramlah Hussain who provided much assistance especially during my struggling in the early years of this work. By extension, I thank IIUM, UniKL-BMI, and Cued Speech Center for allowing me the use of many of their resources, without which this work would not have been possible.
I would also like to thank Cik Roslina Ahmad of the Cued Speech Center, with whom I worked closely on the development of speech intelligibility test component of this thesis. Her many ideas and suggestions are truly appreciated and her enthusiasm drove my efforts considerably. A number of Cued Speech Center parents and staff, both past and present, were also instrumental to this thesis work. These include Cikgu Aini, Cikgu Mazrah, Cikgu Zakiah, Cikgu Roy, En. Nadzmi, En. Azmi and the rest.
Of course I would be remiss if I did not acknowledge my parents. I thank them for their patience, for always believing in me, and for setting a high standard of excellence, with word matched in deed. To them I say, “jazakaLLAHU khoiran kathiran" Lastly, this work would not be possible without the emotional support I have received during these graduate years from my lovely wife, Tahiah.
x
TABLE OF CONTENTS
Abstract ... ii
Abstract in Arabic ... iii
Approval Page ... v
Declaration Page ... vi
Copyright Page ... vii
Dedication ... viii
Acknowledgement ... ix
Table of Contents ... x
List Of Tables ... xv
List Of Figures ... xvii
CHAPTER 1: ... 1
Introduction ... 1
Background ... 1
Speech Intelligibility Test for Deaf Children ... 3
Phoneme Recognition ... 6
Artificial Neural Network ... 8
Thesis layout ... 11
Definition of Terms ... 12
CHAPTER 2: ... 15
Speech Intelligibility Measure for Deaf Children ... 15
Introduction ... 15
The Deaf in Malaysia ... 15
Education for the deaf ... 18
Alternative education for the deaf ... 20
Cued Speech ... 21
The Success of Cued Speech ... 25
Cued Speech and Lip-reading ... 25
Cued Speech and Expressive/Receptive Language ... 26
Cued Speech and Reading ... 27
Cued Speech and Listening ... 27
Cued Speech and Sign Language ... 28
Cued Speech and Cochlear Implantation ... 28
Measuring the success of Cued Speech in Malay ... 29
Speech Intelligibility ... 30
Speech Intelligibility Measure ... 30
Interval Scaling Method ... 32
Mean Opinion Score ... 32
Attribute Estimation ... 33
Direct Magnitude Estimation ... 34
Identification Task Method ... 34
xi
Diagnostic Rhyme Test (DRT) ... 36
Modified Rhyme Test (MRT) ... 37
Cluster Identification Test (CLID) ... 38
Phonetically Balanced Word Lists (PBWL) ... 39
Standard Segmental Test (SST) ... 39
Sentence Level Tests ... 40
Comprehension tests ... 41
Summary ... 42
CHAPTER 3: ... 44
Front-end of Speech Recognition System ... 44
Introduction ... 44
Speech Recognition System ... 44
Computer-based speech recognition ... 46
Review of speech recognition researches ... 48
Speech Recognition Process ... 51
Feature Extraction Algorithms ... 53
Linear Prediction Cepstral Coefficients (LPCC) ... 53
Pre-emphasis and Hamming windowing ... 54
Linear Predictive Analysis ... 55
Cepstral Analysis ... 57
Perceptual Linear Prediction Coefficients (PLP) ... 59
Hamming windowing and FFT ... 60
Bark-scale Filter Bank ... 60
Equal-loudness curve ... 63
Intensity-loudness compression ... 64
IDFT and Linear Predictive Analysis ... 65
Cepstral Analysis ... 65
Mel Frequency Cepstral Coefficients (MFCC) ... 66
Pre-emphasis, Hamming windowing and FFT ... 67
Mel scale Filter Bank ... 67
Logarithmic compression ... 69
DCT ... 70
RASTA-PLP ... 70
Summary ... 72
CHAPTER 4: ... 73
Back-end of Speech Recognition System ... 73
Introduction ... 73
Back-end Speech Recognition System ... 73
Dynamic Time Warping (DTW) ... 73
Hidden Markov Model (HMM) ... 74
Artificial Neural Network (ANN) ... 76
Fundamentals of Neural Network ... 78
Processing units ... 78
Network architecture ... 80
Computing Algorithm ... 82
xii
Input and output encodings ... 83
Learning algorithms ... 84
Measuring the length of training ... 86
Terminating the training process ... 87
Properties of neural networks ... 89
Generalizability and pattern recognition ... 89
Scaling ... 90
Plasticity and incremental learning ... 91
Distributed representation ... 91
Backpropagation ... 92
Back-propagation training algorithm ... 95
Back-propagation speed learning ... 96
Hyperbolic tangent ... 96
Momentum ... 97
Adaptive learning rate ... 97
Summary ... 98
CHAPTER 5: ... 99
Malay Speech Intelligibility Test (MSIT) ... 99
Introduction ... 99
Research Design ... 99
Subjects ... 100
Standard reference of intelligibility measure... 101
Sentence Level Test (Helen-questions) ... 103
Test Materials ... 104
Listeners ... 105
Procedures ... 105
Speech Recording ... 105
Playback for Listeners ... 106
Scoring ... 107
Results and Analysis ... 108
Word Level Test (Malay Modified Rhyme Test) ... 111
Test Materials ... 112
Listeners ... 113
Procedures ... 113
Speech Recording ... 113
Playback for Listeners ... 114
Scoring ... 114
Results and Analysis ... 115
Segmental Level Test (Nonsense Word Test) ... 118
Subjects ... 118
Test Materials ... 118
Listeners ... 119
Procedures ... 119
Speech Recording ... 120
Playback for Listeners ... 121
Data Collection ... 123
Results and Analysis ... 124
xiii
Summary ... 131
CHAPTER 6: ... 134
Phoneme Classification ... 134
Introduction ... 134
Recognition of Phonemes ... 135
Spectrogram as speech features ... 136
Introduction to spectrograms ... 137
Generating spectrogram ... 139
Sampling ... 139
Quantization ... 139
Spectogram creation methods ... 140
Spectograms differences ... 141
Phonemes ... 142
Malay phoneme ... 142
Malay consonant phonemes ... 142
Research Design ... 144
Design Approach ... 146
The process ... 146
Spectral analysis ... 148
Matlab implementation of spectral analysis ... 149
Waveform and spectrogram. ... 150
Power spectrum. ... 151
Critical band spectrum. ... 152
Loudness equalization and cube compression. ... 153
LPC spectral analysis. ... 153
MFC spectral analysis. ... 155
PLP spectral analysis. ... 157
RASTA-PLP spectral analysis. ... 158
Locating the features ... 159
Phoneme segmentation ... 160
Speech tokens ... 166
Subjects ... 168
Phoneme classification ... 171
Designing the neural network ... 173
Network architecture. ... 174
Hidden Units. ... 175
Transfer Functions. ... 175
Training algorithm ... 177
Training procedure. ... 177
Experimental results on phoneme classification ... 179
Extended experiment : 10 subjects ... 184
Summary ... 186
Achievement ... 187
Phoneme detection based on image spectrum feature ... 188
Spectrum image can provides detail analysis of failure ... 189
xiv
CHAPTER 7: ... 191
Conclusion ... 191
Conclusion ... 191
Further works ... 196
BIBLIOGRAPHY ... 199
APPENDIX I: THE SUBJECTS ... 209
APPENDIX II: SENTENCE LEVEL TEST QUESTIONS ... 211
APPENDIX III: WORD LEVEL TEST QUESTIONS ... 214
APPENDIX IV: NONSENSE WORD TEST RESULTS ... 215
APPENDIX V: CUED SPEECH IN MALAY (CSM) ... 219
APPENDIX VI: THE INTERNATIONAL PHONETIC ALPHABET ... 220
APPENDIX VII: SELECTED MATLAB FUNCTIONS FOR SITS ... 221
xv
LIST OF TABLES
Table No. Page No.
2.1 Mean Opinion Score (MOS) 33
2.2 Example of Attribute Estimation. 33
2.3 The DRT characteristics. 36
2.4 Examples of the response sets in MRT. 37
4.1 Neural Network notation terms and description. 83 5.1 Category rating of deaf children speech in the Cued Speech Center. 101 5.2 Ratings of 20 deaf children in Cued Speech Center as perceived by
experienced teachers and speech therapist at the center. 102 5.3 Comparison of ratings of 20 deaf children in Cued Speech Center
between the experienced teachers and speech therapist vs. Helen’s rating. 109
5.4 Examples of Malay MRT test sentences 112
5.5 Phonemes distribution in 200 words list of Malay Modified Rhyme Test 115 5.6 Phonemes distribution in an article in Malay newspaper 117 5.7 Comparison of ratings of 20 deaf children in Cued Speech Center
between the experienced teachers and speech therapist vs Nonsense
Word Test. 126
5.8 Details of the mean and standard deviation of the 30 children’s Nonsense
Word Test scores. 128
5.9 Levene’s Test for Equality of Variances for 30 children’s NWT scores. 129
6.1 Recognition Results for Subject 1 180
6.2 Recognition Results for Subject 2 180
6.3 Recognition Results for Subject 3 181
6.4 Recognition Results for Subject 4 181
6.5 Individual result on recognition percentage for classifying 22 Malay
phonemes 182
xvi
6.6 Recognition percentage for classifying 22 Malay phonemes based on
gender 182 6.7 Confusion matrix; 22 Malay syllables using MFC spectrum for Subject 1 183
6.8 Confusion matrix; 22 Malay syllables using PLP spectrum for Subject 1 183 6.9 Confusion matrix; 22 Malay syllables using RASTA-PLP spectrum for
Subject 1 184
6.10 Recognition percentage for classifying 22 Malay phonemes for 10
subjects 185
xvii
LIST OF FIGURES
Figure No. Page No.
1.1 Structure of Phoneme Recognizer 6
2.1 Pertuturan Kiu Bahasa Melayu (PKBM) or Cued Speech in Malay
(CSM). 22 2.2 Reception of Key Words in sentences (Nicholls 1979). 26
3.1 Basic block diagram illustrating the major operations in speech
recognition... 51
3.2 Block diagram of LPCC algorithm. 54
3.3 Block diagram of PLP algorithm. 60
3.4 An example of centre frequency distribution in a Bark-scale filter bank
(Nyquist frequency = 5kHz (16.9 Barks)). 61
3.5 Bark scale filter bank (Sampling frequency = 16kHz). 62
3.6 Block diagram of MFCC algorithm 66
3.7 Mel scale filter bank (sampling frequency = 16 kHz). 69
3.8 RASTA-PLP algorithm 71
4.1 A simple Hidden Markov Model, with two states and two output
symbols, A and B. 75
4.2 Architecture of a typical neural network. 79
4.3 Neural network topologies: (a) unstructured, (b) layered, (c) recurrent,
(d) modular (Tebelskis, 1995). 82
4.4 A three-layer back-propagation neural network. 93 5.1 MSIT Comprehension Test: A – Interface for filling in subjects’
particulars. B – Recording page 106
5.2 MSIT Comprehension Test: C – Interface for filling in listeners’ details.
D – Assessment page. 107
5.3 The scoring page for Helen-questions test. 108 5.4 Scores on Helen-questions in percent for Cued Speech Children 109
xviii
5.5 Typical article in Malay newspaper 116
5.6 Details of the subject are recorded prior to the session. 120 5.7 The recording interface includes the speech material in large print, 2
seconds timer and the waveform of the speech signals, the red ‘repeat’
button and the green ‘continue’ button. 121
5.8 Details of the assessor are recorded prior to the session. 122 5.9 The assessment interface includes the set of 22 syllables frame and the
waveform of the speech signals. 123
5.10 Nonsense Word Test scores of 20 prelingually deaf children. 125 5.11 Nonsense Word Test scores of 10 normal hearing children. 125 5.12 Sample conversion matrix of 30 children’s Nonsense Word Test taken
from Appendix D. 130
5.13 MSIT scores for 20 deaf children based on the Nonsense Word Test. 132
6.1 Spectrogram of the word "compute" 138
6.2 Block diagram illustrating the three major processes for phoneme
recognition. 146 6.3 An example of a speech waveform of Malay syllable /ba/ in time
domain. 148 6.4 Spectrogram of speech signals in 6.3. The color of each point shows the
amplitude of a specific frequency at a certain time. 148 6.5 Examples MFC spectrum of syllables /nya/, /fa/, /ba/ and /pa/ 149 6.6 Examples MFC spectrum of syllable /ba/ repeated by the same subject
three times. 149
6.7 Waveform (top) and Spectrogram (bottom) generated from the Matlab
code. 151
6.8 Power spectrogram 152
6.9 Critical Band spectrogram 152
6.10 Output spectrogram after loudness equalization and cube
compression 153 6.11 LPC cepstrum (top) and LPC spectrum (bottom) 154
6.12 MFC cepstrum (top), MFC power spectrum (middle) and MFC auditory
spectrum (bottom) 156
xix
6.13 PLP cepstrum (top) and PLP spectrum (bottom) 158 6.14 RASTA-PLP cepstrum (top) and RASTA-PLP spectrum (bottom) 159 6.15 MFC spectrum of Malay syllable /ba/ using short /a/ (left) and long /a/
(right) 159
6.16 Speech token location 160
6.17 The energy density for LPC, MFC, PLP and RASTA-PLP cepstrum of
syllable /sya/. 162
6.18 Phoneme segmentation process based on 0th order MFC cepstral
coefficients 163 6.19 Phoneme segmentation error due to unnoticed existence of small valley
after the central peak. 163
6.20 Speech representations of Malay syllable /ba/ from its LPC, MFC, PLP
and RASTA-PLP spectrum 165
6.21 40 ms speech tokens of Malay syllable /ba/ from its LPC, MFC, PLP and
RASTA-PLP spectrum 166
6.22 22 speech tokens representation for Malay syllables using LPC spectrum
(left) and MFC spectrum (right) 167
6.23 22 speech tokens representation for Malay syllables using PLP spectrum
(left) and RASTA-PLP spectrum (right) 168
6.24 Speech tokens for Malay syllable /ba/ by four Malay children speakers
using MFC, PLP and RASTA-PLP spectrum 169
6.25 Speech tokens for Malay syllable /ba/ by four Malay children speakers
repeated three times using MFC spectrum 170
6.26 Speech token image become inputs to the MLP neural network 172 6.27 Speech Recognition Process using speech spectrum as inputs to MLP
Neural Network. 173
6.28 Several popular nonlinear transfer functions (Tebelskis, 1995) 176 6.29 Number of consonants vs. the recognition percentage using PLP-based
spectrum image method for 10 subjects 185
6.30 Phonetic features of consonants 188
6.31 Speech spectrum for token /ba/ showing the phonetic features for
consonant ‘b’; voiced, plosive and bilabial 189
xx
6.32 Speech spectrum for token /ba/ for 10 subjects showing the similar
phonetic features in consonant ‘b’; voiced, plosive and bilabial 189 6.33 (a) normal syllable /ba/ and (b) erratic syllable /ba/ without the voiced
phonetic feature 190
1
CHAPTER ONE INTRODUCTION
BACKGROUND
Profoundly deaf children have been shown able to attain intelligible speech. They can be taught to talk although their levels of speech skills may have been historically poor (Perigoe & Le Blanc, 1994). Reduced intelligibility severely compromises communication and social interaction for affected individuals.
Throughout the world, very little public assistance is channeled towards preventing young deaf children from becoming dumb adults. This is because the main handicap of these children is not only invisible but also badly misunderstood. Among the handicapped, deaf children currently receive the most controversial, and probably the least effective help from the specialists responsible for their welfare and development.
It should be more widely appreciated that the most serious handicap which deafness inflicts upon children is not the lack of hearing, but rather the lack of verbal (spoken/written) language development - a consequence of inability to understand speech, which prevents deaf children from picking up verbal language in the same natural way as other children do.
This lack of verbal language development in deaf children retards their intellectual development, impedes their education, restricts their avenues for gainful employment, confines their membership in a social minority, deprives them of their cultural heritage, and prevents them from attaining a normal quality of life. Almost all
2
deaf (as distinct from partially hearing) children suffer this sad fate, even in the world's most highly developed countries.
Although speech deficiencies in the deaf are quite difficult to overcome, learning to produce intelligible speech is not an impossible task (Perigoe & Le Blanc, 1994). If a deaf child can receive a great deal of individual speech instruction from highly skilled teachers, he may learn speech that is reasonably intelligible and very useful. Unfortunately, few deaf children reach this level of speech production, in part because of the limited amount of their time that is available for speech training, and in part because of the shortage of highly proficient teachers and speech therapist. If more efficient means could be found for alleviating the deficient speech communication of the deaf, their educational possibilities would be greatly improved, as would their ability to make their way in a hearing world.
Until the late 1960s, the acquisition of verbal language – as distinct form sign language – had always represented an almost insuperable task for the deaf. However, over the last decades, researchers have demonstrated that even totally deaf children can be helped to acquire verbal languages visually with almost the same ease as they can pick up sign languages. In 1967, Dr. R. Orin Cornett demonstrated a means he had devised which was capable of helping the reception of spoken language for the deaf. It was named ‘Cued Speech’ because hand cues are used simultaneously with normal speech to eliminate ambiguity from lip-reading. Many researches have demonstrated that deaf children using Cued Speech out-performed the majority of children using sign language in verbal language skills (Peterson, 1991). The effectiveness of Cued Speech has been established in many different areas, including improving their speech intelligibility. Berendt et al. (1990) demonstrated that children who were profoundly deaf and had used Cued Speech for more than two years scored better than 92% in
3
receptive language test using the Rhode Island Test of Language Structure (RITLS).
Cued Speech children also scored as good as normal hearing children in the expressive language test using the Developmental Sentence Score (DSS).
Cued Speech was adapted for use with the Malay language in September, 1982, and was given the name Cued Speech in Malay (CSM) or Pertuturan Kiu Bahasa Melayu (PKBM) in Malay. An on-going project initiated by the National Society for the Deaf (NSD) has clearly shown that deaf Malaysian children can be taught how to speak effectively in Malay through the use of CSM (Tan, 1997). The school which was setup by NSD, namely Cued Speech Centre (CSC), focuses on deaf children to use their voice to communicate with other normal hearing people. According to Tan, Cued Speech in Malay (CSM) is at present the only practicable means through which a successful early development of the Malay language in deaf children is possible, regardless of their extent of hearing loss. The children are also able to develop the same age-level reading and writing skills as their hearing peers. The most noticeable feature of CSM is that, whereas children who sign rarely attempt to speak each word that they sign or fingerspell, CSM children invariably vocalize every syllable that they cue.
SPEECH INTELLIGIBILITY TEST FOR DEAF CHILDREN
It is becoming increasingly difficult to ignore the contribution of CSM in helping deaf Malaysian children to speak. These children have shown the ability to speak up to a certain level of intelligibility. Although these phenomena of diversity in speech clarity or speech intelligibility exist in the school, there is no systematic approach to classify these children into speech intelligibility ranking and from then on develop these skills up to a certain standard accepted by the normal hearing person. There are various tests
4
which can be used to measure proficiency in the English language: the Maryland Syntax Evaluation Instrument (MSEI), the Expressive One Word Picture Vocabulary Test (EOWPVT), the Rhode Island Test of Language Structure (RITLS), the Developmental Sentence Score (DSS) and several others. However, far too little attention has been paid to such standard objective speech intelligibility test for use within the deaf Malaysian children training programme. The existing speech perceptual test procedures normally conducted by speech pathologist, have two main shortcomings, namely a subjective evaluation and a capacity to be recognized only by trained professionals. This indicates a need to understand the various methods available for speech intelligibility test and to investigate the appropriate speech intelligibility test system accordingly for deaf Malaysian children.
One aim of this research is to improve on the results obtained by previous work on speech intelligibility measures for deaf Malaysian children. In Malaysia, most software used for diagnosis and management of deaf children are based on the English language, which is unsuitable, given the great differences in the two languages. In non-phonetic language such as English, spelling often has little relation to actual pronunciation. On the other hand, Malay language is spoken just as it is written most of the time. There are standard sounds to all pronunciation of its written consonants and vowels.
In Malay language, most of the spoken syllables are in CV (consonant-vowel) or CVC (consonant-vowel-consonant) form, which makes the synthesis a bit easier than with other languages. The 22 Malay consonants (b, c, d, f, g, h, j, k, l, m, n, p, r, s, t, v, w, y, z, sy, ng, ny) occur once with each Malay vowel (a, o, e, è, i, u) to form a syllable, CV. It is expected that if a child is able to pronounce all the consonants correctly than it will indicate that his or her speech intelligibility is excellent. On the