PMIO CONCENTRATIONS SHORT
TERMPREDICTION
USING REGRESSION, ARTIFICIAL NEURAL NETWORK AND
HYBRID
MODELS
by
AHMAD ZIA UL-SAUFIE MOHAMAD JAPERI
Thesis submitted in fulfillment of
therequirements
for degree of Doctor of Philosophy
ACKNOWLEDGEMENT
First and above
all,
Ipraise Allah,
thealmighty
forproviding
me thisopportunity
and
granting
mecapability successfully.
This thesis appears in its current from due tothe assistance and
guidance
ofseveralpeople
andorganization
I would therefore liketo offermy sincere thanks to all of them.
I would like to express my greatest
appreciation
and thanks to my supervisor, Associate Professor Ahmad ShukriYahaya
and myco-supervisor,
Professor Dr. NorAzam Ramli for
letting
me to be under theirsupervisions.
Ireally appreciate
all theguidance, important suggestion,
support,advice,
and continuous encouragement incompleting
my PhD.Not
forgotten
mybig
thanks to all my friends under Clean Air ResearchGroup,
Dr.Hazrul,
ZulAzmi,
DrIzma, Norrimi, Hasfazilah, Maisarah, Azian,
Maher andNazatul for the
cooperation
andhelp during
mystudy.
Lastly
and mostimportantly,
I would like to dedicate this thesis to my parents, MohamadJaperi
Hassim and AzizahAwang
for theirgood wishes,
continuousencouragement and motivation. For my
wife,
WanNor Aishah Meor Hussain. thankyou for
always being
there for me.My
son, Umar Danish and mydaughter,
FatimahTasnim who
inspired
me to face thechallenges
andcomplete
this research.Finally,
I wish to express mybiggest acknowledgement
to UniversitiTeknologi
Mara for
providing
me financial upport under Skim Latihan Akedemik IPTA(SLAI).
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENT
TABLE OF CONTENTS
LIST OF TABLES
LIST OFFIGURES
LIST OF ABBREVIATIONS
ABSTRAK
ABSTRACT
CHAPTER 1 : INTRODUCTION
1.0 INTRODUCTION
1.1
1.2
1.3
AIR POLLUTION IN MALAYSIA
PROBLEM STATEMENT
OBJECTIVES
IA SCOPE OF RESEARCH
1.5 THESIS LAYOUT
CHAPTER 2 : LITERATURE REVIEW
2.0
2.1
2.2
PARTICULATE MATTER
SOURCES OF PARTICULATE MATTER
2.1.1 MotorVehicles
2.1.2
Industry
/ Power Plants2.1.3
Open Burning
/Trans-Boundary
CHARACTERI 'TICS OF PARTICULATE MATTER
11
III
VB
X111
XVI
XIX
XXI
l
l
2
8
10
10
12
14
14
15
16
18
19
21
2.3 EFFECT OF
PMlO
ON HUMANS 222.4 WEATHERINFLUENCE 25
2.4.1 Wind
Speed
252.4.2
Temperature
AndSunlight
262.4.3 Relative
Humidity
262.5 REGRESSION MODELS 27
2.5.1
Multiple Linear Regression
272.5.2 Robust
Regression
312.5.3
Quantile Regression
342.6 ARTIFICIAL NEURAL NETWORK 36
2.6.1 Feedforward
Backpropagation
372.6.2 General
Regression
NeuralNetwork 412.7 HYBRID MODEL 45
2.7.1
Principal Component Analysis
452.8 CONCLUSION 46
CHAPTER3 : METHODOLOGY 50
3.0 INTRODUCTION 50
3.1 STUDY AREA 51
3.2 MONITORING RECORD
ACQUISITIONS
553.3 PARAMETERS SELECTION 56
3.4 MONITORING RECORD SCREENING 57
3.5 DESCRIPTIVE STATISTICS 59
3.5.1 Box and Whisker Plot 59
3.5.2 One
Way Analyses
of Variance 603.7 REGRESSION MODELS 63
3.7.1
Multiple
LinearRegression
Models 633.7.2 Robust
Regression
Models 663.7.3
Quantile Regression
Models 693.8 ARTIFICIAL NEURAL NETWORK MODELS 72
3.8.1 Feedforward
Backpropagation
Models 723.8.2 General
Regression
Neural Network 783.9 PRINCIPAL COMPONENTANALYSIS 81
3.10 HYBRID MODELS 86
3.11 PERFORMANCE INDEX 86
3.12 DEVELOPMENT OF A NEW PREDICTIVE TOOL 87
CHAPTER4 : RESULT 90
4.0 INTRODUCTION 90
4.1 CHARACTERISTIC OF MONITORING RECORD 90
4.1.1
Descriptive
Statistics 914.1.2 Box andWhisker Plot 93
4.1.3 One
Way Analyses
of Variance(ANOVA)
944.2 REGRESSION MODELS 97
4.2.1
Multiple
LinearRegression
Model 974.2.2 Robust
Regression
Models 1094.2.3
Quantile Regression
Models 1154.3 ARTIFICIAL NEURAL NETWORK MODEL 122
4.3.1 Feedforward
Backpropagation
Models 1224.3.2 General
Regression
Neural Network Models 1264.4 APPLICATION OF HYBRID MODELS 129
4.5
4.6
4.7
4.4.1
Principal Component Analysis
4.4.2
Principal Component Analysis
andMultiple
LinearRegression
4.4.3
Principal Component Analysis
and RobustRegression
4.4.4
Principal Component Analysis
andQuantile Regression
4.4.5
Principal Component Analysis
and FeedforwardBackpropagation
4.4.6
Principal Component Analysis
and GeneralRegression
NeuralNetwork
VERIFICATION OF MODELS
DETERMINING THE MOST SUITABLE MODEL
DEVELOPING ANEW PREDICTIVE TOOL FOR FUTURE
PMlO
CONCENTRATIONS PREDICTION IN MALAYSIACHAPTER 5 : DISCUSSION
5.0
5.1
5.2
5.3
5.4
5.5
INTRODUCTION
REGRESSION MODELS
ARTIFICIAL NEURAL NETWORK MODELS
HYBRID MODELS
THE MOST SUITABLE MODEL
DEVELOPING ANEW PREDICTIVE TOOL FOR FUTURE PMlO CONCENTRATIONS PREDICTION IN MALAYSIA
CHAPTER 6 : CONCLUSION AND FUTURE WORK
6.1
6.2
CONCLUSION
LIMITATION AND FUTURE WORK
REFERENCES
LIST OF PUBLICATION
129
139
143
148
153
155
157
160
169
172
172
172
176
178
182
186
187
187
189
LIST OF TABLES
Page
Table .1.1
Malaysia
AirPollution Index(API)
2Table 1.2 API
intervals, description
of airquality,
andrelationship
with 3PMlO
valuesTable 1.3
Monitoring
stations coordinates anddescription
11Table 2.1
Summary
of international wildfire 20Table 2.2
Comparison
of effecton humanhealth forPM2.5
andPMlO
24Table 2.3 Three estimation methods for robust
regression
32Table 2.4
Comparison
of the five methods 47Table 2.5
Comparison ofPMlO
modelsusing daily monitoring
records 48Table 2.6
Comparison ofPMlO
modelsusing hourly monitoring
49records
Table3.l Summarization of
parameters
selectionby previous
56researchers
Table 3.2
Percentage
ofmissing
value for each station 58Table 3.3 ANOVA formula 61
Table 3.4 Total number of
monitoring
record for each sites(in days)
62Table 3.5 Information collection ofnew
monitoring
recordsusing
62DRM
Table 3.6
Weighting
functionequations
for robustregression
68Table 3.7 Performance indicators 87
Table 4.1
Descriptive
statistics for allmonitoring
stations 91Table 4.2 Result for ANOV A 95
Table 4.3 Result of Duncan
multiple
range test 96Table 4.4 Result ofDuncan
multiple
rangetest2002 96Table 4.5 Result ofDuncan
multiple
rangetestfor 2004 96 Table 4.6 Result of DuncanMUltiple
rangetest for2005 96 Table4.7 Result ofDuncanmultiple
rangetestfor 2006 97Table 4.8 Model summary
ofPMlO:
D+l 98Table 4.9 Model summary
ofPMlO
: D+2 99Table 4.10 Model summary
ofPMlO
: D+3 99Table 4.11 Result for ANOVA : D+ 1 100
Table 4.12 Result for ANOVA : D+2 101
Table 4.13 Result forANOVA : D+3 101
Table 4.14 The
performance
indicator values for MLR model 109Table 4.15 Performance Indicators for D+ 1
PMlO
concentration 110prediction using
RR models inSeberang Jaya
Table 4.16
Ranking
ofperformance
indicators for D+1PMlO
111 concentrationprediction using
RRmodels inSeberang Jaya
Table 4.17
Summary
of the best model for robustregression:
D+ 1 112Table 4.18
Summary
of the best model for robustregression:
D+2 113Table 4.19
Summary
of the best model for robustregression:
D+3 114Table 4.20
Quantile
values of variables 115Table 4.21 Coefficient of
Quantile Regression
Models forSeberang
116Jaya:
D+lTable 4.22 Performance Indicators for D+ 1
PMlO
concentration 117prediction using QR
models atSeberang Jaya (step one)
Table 4.23 Performance Indicators forD+
l'PMlO
concentration 118prediction using QR
models atSeberang Jaya (step two)
Table 4.24
Summary
ofthe best model forquantile regression:
D+ 1 119Table 4.25
Summary
of the best model forquantile regression
: D+2 120Table 4.26
Summary
of the best model forquantile regression
: D+3 121Table 4.27 Validation FFBP models
using
different number ofneurons 123at
Seberang Jaya
Table 4.28 Resultfor NAE based cross validation method 123
Table 4.29 Resultfor FFBP models
using
different transfer functions 124at
Seberang Jay (D+ l)
Table 4.30
Summary
of the best FFBPmodel: D+1 125Table 4.31
Summary
of the best FFBP model: D+2 125 Table 4.32Summary
of the bestFFBP model : D+3 126 Table 4.33 GRNNresultusing
differentsmoothing
factors for D+ 1 at 127Seberang Jaya' (step one)
Table 4.34 GRNN result
using
differentsmoothing
factors for D+ 1 127at
Seberang Jaya (step two)
Table 4.35
Summary
for the best GRNN model for allprediction days
128Table 4.36 Kaiser
Meyer
Olkin Statistics 130Table 4.37 Barlett's test of
Sphericity
130Table 4.38 Total variance
explained
forSeberang Jaya (D+
1)
131Table 4.39 Total variance
explained
for allmonitoring
sites 133Table 4.40 Rotated
component
matrix for Peraimonitoring
station 133Table 4.41 Rotated
component
matrix forKuching monitoring
station 134Table 4.42 Rotated component matrix forNilai
monitoring
station 135Table 4.43 Rotated
component
matrix forSeberang Jaya monitoring
136station
Table 4.44 Rotated componentmatrix for Kuala
Terengganu monitoring
137station
Table 4.45 Rotated componentmatrix for
Bachang monitoring
station 138Table 4.46 Rotated component matrix forJerantut
monitoring
station 138Table 4.47
Summary
model ofPCA-MLR : D+ 1 140Table 4.48
Summary
model ofPCA-MLR : D+2 141Table 4.49
Summary
model ofPCA-MLR : D+3 142Table 4.50 Performance Indicators forD+ 1
PMlO
concentration 143prediction using
PCA-RRatSeberang Jaya
Table 4.51
Ranking
ofperformance
indicators for D+ 1PMlO
144 concentrationprediction using
PCA-RRinSeberang Jaya
Table 4.52
Summary
of the best model for PCA-RR : D+1 145 Table 4.53Summary
of the best model forPCA-RR : D+2 146 Table 4.54Summary
of the best model forPCA-RR: D+3 147 Table 4.55 Performance Indicators for D+1PMlO
concentration 148prediction using PCA-QR
atSeberang Jaya (step one)
Table 4.56 Performance Indicators forD+ 1
PMlO
concentration 149prediction using PCA-QR
atSeberang Jaya (step two)
Table 4.57
Summary
of the best model forPCA-QR
: D+ 1 150Table 4.58
Summary
of the best modelofPCA-QR
: D+2 151Table 4.59
Summary
of the best model forPCA-QR
: D+3 152Table 4.60 Validation PCA-FFBP models
using
different number of 153 hidden nodes atSeberang Jaya
Table 4.61 Result ofPCA-FFBP
using
different transfer function at 154Seberang Jaya
Table 4.62
Summary
of the best model for PCA-FFBP : D+ 1 155 Table 4.63Summary
of the best model for PCA-FFBP : D+2 155 Table 4.64Summary
of the best model for PCA-FFBP : D+3 155Table 4.65 Performance indicator for PCA-GRNN
using
different 156smoothing
function(step one)
Table 4.66 PCA-GRNN result
using
differentsmoothing
factors: D+ 1 157(step two)
Table 4.67
Summary
ofperformance
indicator for PCA-GRNN for all 158 sitesTable 4.68
Comparing performance
indicator betweenvalidation and 159 verification for all model atSeberang Jaya
Table 4.69 Performance indicator for all models for Perai 162
Table 4.70 Performance indicator forall models for
Kuching
163Table 4.71 Performance indicator for all model for Nilai 164
Table 4.72 Performance indicator for all model for
Seberang Jaya
165Table 4.73 Performance indicator for all model for Kuala
Terengganu
166Table 4.74 Performance indicator for all model for
Bachang
167Table 4.75 Performance indicator for all model for Jerantut 168
Table 4.76
Summary
of the best model forprediction
futurePMlO
168concentration for all seven
monitoring
stationTable 5.1
Summary
of the bestregression
model for theprediction
of 172future
PMlO
concentration for all sevenmonitoring
stationsTable 5.2
Average
accuracy ofregression
models basedontype
of land 175use
Table 5.3
Summary
ofthe best ANN models forpredicting
futurePMlO
176concentration for all seven
monitoring
stationsTable 5.4
Average
accuracyof ANN models based ontype
of land use 178 Table 5.5Summary
of the best ANN models for theprediction
of 179future
PMlO
concentration for all sevenmonitoring
stationsTable 5.6 Average accuracy of ANN modelsbased on
type
of land use 181Table 5.7
Table 5.8
Table 5.9
Summary
ofthe most suitable models forpredicting
futurePMIO concentration for all seven
monitoring
stationsAverage accuracyofANN models basedon
type
ofland useResults for
Tukey's-B
Test182
183
185
LIST OF FIGURES
Page Figure
1.1Malaysia
AnnualAverage
ConcentrationofPMlO
1999-2011 3Figure
1.2 Numberofunhealthy days
for seven selectedsites,
2001 - 72010
Figure
2.1PMlO
Emission Loadsby
source(in
metrictonnes),
2003-2011 16Figure
2.2 Number ofregistered
vehicles inMalaysia
from2004 to 2011 17Figure
2.3: Number ofregistered
vehicles inMalaysia by category,
from 182004 to 2011
Figure
2.4 Industrial airpollution
sourcesby
year(2001
to2011)
19Figure
2.5 Schematicdiagram
ofparticle classifications,
sizedistribution,
22formation and elimination processes, modes of
distribution,
and
composition
Figure
2.6 Inhalation of Particulate Matter:(a)
PM> 10 urn,(b)
lum< 23PM :S 1
Oum
and(c)
PM::; lumFigure
3.1 Research flow forstudy procedure
51Figure
3.2Map
of research area 52Figure
3.3 Schematicdiagram
of Beta Attenuation Monitor(BAM 1020)
55Figure
3.4 Standard box and whiskerplot
60Figure
3.5 Illustration ofordinary
least square(OLS)
63Figure
3.6 Procedure fordevelopment
ofmultiple
linearregression
64models
Figure
3.7 Scatterplot
ofsimple
linearregression
and robustregression
66Figure
3.8 Procedure fordevelopment
of robustregression (RR)
models 67Figure
3.9 Aplot
ofquantile regression
70Figure
3.10 Procedure fordevelopment
ofquantile regression (QR)
71models
Figure
3.11 Procedure fordevelopment
offeedforwardbackpropagation
74(FFBP)
modelsFigure
3.12 Architecture ofafeedforwardbackpropagation
neuralnetwork 75(FFBP)
Figure
3.13 Illustration ofcross validationtechnique
77Figure
3.14Sigmoid
transfer function 78Figure
3.15: Procedure fordevelopment
ofgeneral regression
neural 80network
(GRNN)
modelsFigure
3.16 Architecture ofgeneral regression
neural network 81Figure
3.17 Procedure fordevelopment
ofprincipal component analysis
83(peA) analysis
Figure
3.18Original
axis and newaxisusing
varimax rotation 85Figure
3.19 Architecture ofahybrid
models for theprediction
ofPMIO
86concentrations
Figure
3.20 Flow chart fordevelopment
ofnewapplications (software)
88Figure
4.1 Box and whiskerplot
ofPMIO
concentrations 94Figure
4.2Histogram
for PM10 residual forD+ 1 103Figure
4.3Histogram
forPM10
residual for D+2 104Figure
4.4Histogram
forPMlOresidual
for D+3 105Figure
4.5 Scatterplot
of residual versus fitted values for D+1 106Figure
4.6 Scatterplot
of residual versus fitted values for D+2 107Figure
4.7 Scatterplot
of residual versus fitted values for D+3 108Figure
4.8 Architecture ofahybrid
models for theprediction
ofPMIO
129 concentrationsFigure
4.9 Interface for FutureDaily
PM10 concentrations system 169Figure
4.10Pop-up
menu for site selection 170Figure
4.11Pop-up
menu for method selection 170Figure
4.12Dynamic input monitoring
record 171Figure
4.13 New windows to confirm method and site selection 171Figure
4.14 Prediction PMlO concentrationfor D+1,
D+2 and D+3 171Figure
5.1Comparing
the average accuracybetweenhybrid
andsingle
180models for D+
1,
D+2 and D+3Figure
5.2Comparing
betweenANN,
MLRandhybrid
models 184API
ANOVA
ANN
ASMA
BAM
BCG
BKE
CO
D-W
DoE
DRM
EPA
FFBP
GRNN
GUl
IA
ILP
JRT
KCH
KMO
KTG
LIST OF ABBREVIATIONS
Air Pollution Index
Analysis
of Variance Artificial Neural NetworkAlam Sekitar
Malaysia
Sdn. Bhd.Beta Attenuation Monitor
Bachang
Butterworth Kulim
Expressway
Carbon monoxide
DurbinWatson'
Department
ofEnvironment(Malaysia)
Direct
Reading
MonitorEnvironmental Protection
Agency
Feedforward
Backpropagation
General
Regression
Neural NetworkGraphical
UserInterfaceIndex of
Agreement
Institut Latihan Perindustrian
Jerantut
Kuching
Kaiser-Meyer
OlkinKuala
Terengganu
MAAQG
MAD
MLP
MLR
NAE
NLI
N02 03
OLS
PCA
PC
PI
PLUS
PM2.5 PMlO QR
REF
RR
R2
RH
RMSE
RRMSE
SD
Malaysian
Ambient AirQuality
GuidelinesMedian Absolute Deviation
Multi
Layer Perceptron Multiple
LinearRegression
Normalized Absolute Error
Nilai
Nitrogen
DioxideOzone
Ordinary
LeastSquares
Principal Component Analysis Principal Component
Performance Indicators
Projek Lebuhraya
Utara SelatanParticulate matter less than 2.5 urn
Particulatematter less than 10 urn
Quantile Regression
Radial Basis Function
Robust
Regression
Coefficient of Determination
Relative
Humidity
RootMean
Square
ErrorRelative Root Mean
Square
ErrorStandard Deviation
SLR
SJY
PRJ
S02
SSESSR
SST
T
USEPA
VIF
WHO
Simple
LinearRegression Seberang Jaya
Perai
Sulphur
DioxideSum of
Squares
Due to ErrorSum of
Square
Due toRegression
Total Sum of
Squares Temperature
United States Environmental Protection
Agency
Variance Inflation Factor
World Health
Organization
RAMALAN JANGKA PENDEK KEPEKATAN
PMlO
MENGGUNAKAN MODELREGRESI,
MODEL RANGKAIAN NEURAL BUATAN DANMODELmBRID
ABSTRAK
Zarah
terampai mempunyai
kesan yangsignifikan kepada
kesihatan manusiaapabila kepekatan
zarahterampai
melebihigaris panduan
kualiti udara diMalaysia. Kajian
ini
hanya
akanmengfokuskan kepada
zarahterampai
yangmempunyai
diameteraerodinamik
kurang daripada 10j1m,
dinamakanPMlO•
Ini memerlukan model berstatistikbagi
membuatramalankepetakatan PM10 pada
masa akandatang. Tujuan kajian
ini ialah untukmembangunkan
dan meramalkankepekatan PMlO pada
keesokan hari
(D+ 1),
dua hariberikutnya (D+2)
dantiga
hariberikutnya (D+3) bagi tiga kategori
iaitu kawasan industri(tiga stesen),
kawasan bandar.(dua kawasan),
satu kawasan
subkelompok
bandar dan satu kawasanrujukan. Kajian
inimenggunakan
cerapan purata data harian dari tahun 2001hingga
2010.Tiga
kaedahutama telah
digunakan
dalammembangunkan
model ramalankepekatan PMlO
iaituregresi
linearberganda, rangkaian
neural buatan dan model hibrid.Tiga
modelregresi
telahdigunakan
iaituregresi
linearberganda (MLR), regresi teguh (RR)
danregresi
kuantil(QR). Rangkaian
neural rambatan balik(FFBP)
danrangkaian
neuralregresi
umum(GRNN) digunakan
dalamrangkaian
neural buatan. Model hibrid ialah model yangmenggunakan gabungan
analisiskomponen
utama(PCA) dengan
semua lima kaedah
peramalan
iaituPCA-MLR, PCA-QR, PCA-RR,
PCA-FFBP and PCA-GRNN.Keputusan bagi
modelregresi menunjukkan
bahawa RR danQR
lebihbaik
daripada
MLR dan bolehdianggap sebagai
kaedah alternatifapabila
andaianbagi
MLR tidakdapat dipenuhi. Keputusan bagi rangkaian
neural buatanmenunjukan
FFBP lebih baikjika dibandingkan dengan
GRNN. Model hibrid memberikeputusan
yang lebih baikjika dibandingkan dengan
model ramalantunggal
darisegi ketepatan
dan ralat. Akhirsekali,
sebuahaplikasi peramalan
barudibangunkan
untuk membuat ramalan masahadapan bagi kepekatan PMIO dengan
menggunakan sepuluh
model ramalan yang telahdiperolehi dengan purata ketepatan
untuk
D+l(0.7930),
D+2(0.6926)
and D+3(0.6410). Aplikasi
ini akan membantupihak
berkuasatempatan untukmengambil
tindakan yangwajar bagi mengurangkan
kepekatan PMlO
danjuga sebagai
satu sistem amaranawal.PM10
CONCENTRATIONS SHORT TERM PREDICTION USINGREGRESSION,
ARTIFICIAL NEURAL NETWORKAND HYBRID MODELSABSTRACT
Particulate matter has
significant
effecttohuman health when the concentration level ofthis substance exceedsMalaysia
Ambient AirQuality
Guidelines. This research focused onparticulate
matter withaerodynamic
diameter less than 10 11m,namely PMlO.
Statisticalmodellings
arerequired
topredict
futurePMlO
concentrations. The aims of thisstudy
are todevelop
andpredict
futurePMlO
concentration for nextday (D+ 1),
nexttwo-days (D+2)
andnext threedays (D+3)
in seven selectedmonitoring
stations in
Malaysia
which arerepresented by
fourth different types of land uses i.e.industrial
(three sites),
urban(three sites),
a sub-urban site and a reference site. Thisstudy
useddaily
averagemonitoring
record from 2001 to 2010. Three main models forpredicting
PM10 concentration i.e.multiple
linearregression,
artificial neural network andhybrid
models were used. The methods which were used inmultiple
linear
regression
weremultiple
linearregression (MLR),
robustregression (RR)
andquantile regression (QR),
while feedforwardbackpropagation (FFBP)
andgeneral regression
neural network(GRNN)
were used in artificial neural network.Hybrid
models are combination of
principal component analysis (PCA)
with all fiveprediction
methods i.e.PCA-MLR, PCA-QR, PCA-RR,
PCA-FFBP and PCAGRNN. Results from the
regression
models show that RR andQR
are betterthan theMLR method and
they
can act as an alternative method whenassumption
for MLR isnot satisfied. The models for artificial neural network show that FFBP is better than the GRNN.
Hybrid
models gave better resultscompared
to thesingle
models intermof accuracy and error.
Lastly,
a newpredictive
tool for futurePMlO
concentrationwas
developed using
ten models for each site with average accuracy forD+l(0.7930),
D+2(0.6926)
and D+3(0.6410).
Thisapplication
willhelp
localauthority
to take proper action to reducePMlO
concentration and asearly warning
system.
CHAPTER!
INTRODUCTION
1.0 INTRODUCTION
Air
pollution
hassignificant
effect to humanhealth, agriculture
and ecosytem(Mohammed,2012).
There are numerous reportspertaining
to the effect of airpollution
onhuman
health, agriculture
crops, forestspecies
and ecosystem. Severallarge
cities inMalaysia
havereading
of ambient airquality
that areincreasing
andexceeding
thenational ambient air
quality
standard(Afroz
etaI., 2003).
Malaysia
has 52monitoring
stations maintainedby
theDepartment
of EnvironmentMalaysia (2012).
All stationsprovide hourly
measurements ofparticulate
matter withaerodynamic
diameter less than orequal
to 10 Ilm(PMlO),
ozone(03), sulphur
dioxide(S02),
carbon monoxide(CO)
andnitrogen
dioxide(N02). PMlO
concentration is chosen because PM10 hassignificant impacts
on humanhealth, agriculture
andbuildings (Lee, 2010).
Fellenberg (2000),
Godish(2004)
and Tam and Neumann(2004)
found thatnegative
health effect were
clearly
related to PM10 such asasthma,
nose and throatirritations, allergies, respiratory
relatedillnesses,
and prematuremortality.
Sedek etal., (2006)
found that PM10 gave
negative impact
toproductivity
of shortcycle plants
such asvegetables.
1.1 AIR POLLUTION IN MALAYSIA
The
Department
of Environment(DOE) Malaysia
uses Air Pollution Index(API)
tocompare itself with other
regional
countries. The API wasadopted
after theDepartment
of Environment
Malaysia
revised its index system in 1996. TheAPIclosely
follows the Pollutant Standards Index(PSI)
system of the United States(Department
ofEnvironment
Malaysia, 1996)
as shown in Table 1.1. Afroz etal., (2003) reported
thatthe main air
pollutant
inMalaysia
is carbon monoxide(CO), sulphur
dioxide(S02), nitrogen dioxide, and
otherparticulate
matter, with anaerodynamic
diameter of less than 10 urn.Table 1.1:
Malaysia
Air Pollution Index(API) (Source: Department
of theEnvironment, Malaysia, 2012)
API
Description
O <API::; 50 Good
5 O <API ::; 100 Moderate 100 < API � 200
Unhealthy
200 < API � 300
Very Unhealthy
> 300 Hazardous
Sansudin
(2010),
Ramli etaL, (2001)
andAwang
etal., (2000)
indicated thatPMlO
is the main contributor to haze events. This meansthat when thePMlO
concentration level ishigher
thanMalaysian
Ambient AirQuality
Guidelines(MAAQG),
the government under the National Haze Action Plan can announcewarning
status for locations withprolonged
APIsexceeding
101 for more than 72 hours(Perimula, 2012). Thus,
thisresearch was carried out until next three
days (72 hours)
topredict PMlO
concentrations.Malaysia's
safe concentration for PM10 is based on theDepartment
of EnvironmentMalaysia (2002) guidelines,
of150�lg/m30ver
a 24 hour average, and50�g/m3
for lyear. Table 1.2 shows the
relationship
between API andPMlO
concentrations inMalaysia.
Table 1.2: API
intervals, description
ofairquality,
andrelationship
with PMlO values(Modified
from theDepartment
ofEnvironment, Malaysia (2012))
API Description PMlO Values
(uz/m')
O <API s 50 Good O < PMlO� 75
50 <API s 100 Moderate 75 < PMlO s 150
100< API � 200 Unhealthy 150 < PMlO s 350 200 < API s 300
Very Unhealthy
350 < PMlO � 420 300 < API s 500 Hazardous 420 < PMlO � 600> 500 Very Hazardous > 600
The annual average PM10 concentrations for
Malaysia
from 1999 until 2011 is shown inFigure
1.1. The result shows that the average concentration for every year is below theMalaysia
ambient airquality guideline
forPMlO
concentrationsexcept
for 2002 when the value isequal
withMalaysia
ambient airquality guideline
of50Jlg/m3•
Besidesthat,
theFigure
1.1 also showincreasing
number ofmonitoring
sites from 45 sites in 1999 to 52sites in 2011.
1999 2000 2001 2002 2003 2004 200s 2006 2007 2008 2009 2010 2011
Concentration 4' 40 44 50 44 48 49 49 43 42 45 39 43
Numberof Sites 45 50 50 50 51 51 , 51 51 51 51 51 52 52
Figure
1.1 AnnualAverage
Concentration ofPMlO
forMalaysia
from 1999-2011(Department
ofEnvironmentMalaysia, 2012)
MaJa'lsianAmblentAirQualityGuide InesForPM_:.SOpglml
This section were discussed annual average concentration of
PMlO
forMalaysia
from2001 until 2010 because the data were used in this
study.
In2001,
theDepartment
ofEnvironment, Malaysia,
stated that overall airquality
wasgood
to moderate.Only
afew
days
were identified asunhealthy,
becausePMlO
and the ozone werehigher
than theMAAQG (50 Jlg/m3)
forJuly
2001 of thatyear(dry season). Klang reported
sevendays
and Kuala
Selangor experienced eight unhealthy days
in2001,
becausePMlO
washigh
due to forest fires and other
burning
activities(Department
ofEnvironmentMalaysia, 2002).
Sabah and Sarawakexperienced unhealthy
airquality,
due to openburning
activities from
shifting agriculture activities,
for June andJuly
2001(Department
ofEnvironment
Malaysia, 2002).
Heil
(2007)
identifiedmajor
fires in West Kalimantanduring August
to November2002. This caused the number of
unhealthy days
to increase from three toeight,
due toparticulate
matter fromtrans-boundary
hazepollution
in Sarawak(Sansuddin, 2010).
The overall air
quality
in 2002dropped
incomparison
to theprevious
year.However, PMlO
and the ozone wereprevalent
aspollutants
inMalaysia.
In KualaSelangor unhealthy
airquality
was causedby high
PM10 in the air.However,
nounhealthy days
were
reported
from the east coast ofMalaysia
in 2002(Department
of EnvironmentMalaysia, 2003).
The
Department
of EnvironmentMalaysia (2004)
stated that aslightly improved
overallair
quality
was observedcompared
to theprevious
year. InPenang, PMlO
andS02
werethe main cause of
unhealthy days,
due to intensi e industrial activities in the area. In2003, trans-boundary
hazepollution
did not affect the airquality
in Sarawak and Sabahsuch as in
previous
years(Department
of EnvironmentMalaysia, 2004).
In
2004,
theDepartment
ofEnvironment, Malaysia
stated thatPMlO
was theprevalent pollutant
inMalaysia, causing
moderate haze inJune, August,
andSeptember,
due totrans-boundary pollution,
in the form of forest fires in Sumatra asreported by
theASEAN
Specialised Meteorological
Centre. Fires in Kalimantan also affected southern Sarawak(Department
of EnvironmentMalaysia, 2005).
Several
parts
ofMalaysia experienced
hazeepisodes
frommid-May
until mid-October2005,
causedby
forest and land fires inthe Riau Province of CentralSumatra,
Indonesia(Sansuddin,
2010 and MdYusof, 2009). Central,
eastern, and northern parts,experienced
severe haze between lstAugust
2005 and15th August
2005.However,
on11th
August 2005,the
airpollution
index exceeded 500 in KualaSelangor
and PelabuhanKlang
that was causedby
peat land fires inSelangor (Md Yusof, 2009).
Other hazeepisodes
affected the overall airquality
inMalaysia,
between moderate togood levels,
in 2005
(Department
of EnvironmentMalaysia, 2006).
Hyer
and Chew(2010)
identified thathigh particulate
events betweenJuly
and October2006,
was causedby trans-boundary pollution
from forest fires in Sumatra and Kalimantan. TheKlang Valley
recorded that all of itsunhealthy
airquality days
in 2006(25 days)
were causedby
PMloas thepredominant pollutant, during
the SouthWesterly
monsoon
(Department
of EnvironmentMalaysia, 2007).
The
Malaysian
EnvironmentQuality Report (Department
of EnvironmentMalaysia, 2008), reported
that the overall airquality
in 2007improved significantly compared
tothe
previous
to favourable weather conditions(weak
to medium LaNina)
andno
trans-boundary
hazepollution.
The mainpollutant
were causedby ground
levelozone and
PMlO.
Sansuddin, (2010)
observed aslightly improved
airquality days
in 2008compared
tothe
previous
year, due to an intensive surveillance programme andpreventive
measuresundertaken
by Department
of EnvironmentMalaysia. Furthermore,
notrans-boundary
haze
pollution
was observed in 2008.PMlO
and the ozone remained the mainpollutant
source for
unhealthy days
recorded in theKlang Valley, Negeri Sembilan, Perak, Kedah,
PulauPinang,
andlahar,
in 2008.During
thisperiod,
the sourceofPMlO
comesfrom
peat-land burning during dry periods
and emissions from motorvehicles.In
2009,
the meanPMlO
concentrationsslightly
increased from42j.lg/m3
in 2008 to45j.lg/m3
in 2009. This was due topeat-land
fires andtrans-boundary
airpollution during
hot anddry
condition(moderate
to strongEl-Nino), especially
between June andAugust
2009.However,
the annualPMlO
average was45j.lg/m3,
which is below theMalaysian
Ambient AirQuality
Guideline value of50j.lg/m3 (Department
ofEnvironment
Malaysia,
20 lO).
The overall air
quality
in 2010 wassignificantly improved (39j.lg/m3) compared
to theprevious
year(45j.lg/m3). Higher PMlO
values were recorded in several areas of Johor and Melaka in October2010,
due totrans-boundary
hazepollution (Department
ofEnvironment
Malaysia, 2011). However,
the annualPMlO
average in 2010 wasonly
39ug/rrr';
which was the lowest value recorded since 1999(Donham, 2000;
Radon etal.,
2001).
The number of
unhealthy days
for the seven selected sites from 2001-20lOis shown inFigure
1.2. Thehighest
number ofunhealthy days
was recorded in2002, 2004,
2005 and2006 because of the
high particulate
events in those years. The main contributor to theunhealthy days
in 2002 was themajor
fires in the west coast of Kalimantan(Sansuddin,
20 l
O). Trans-boundary pollution
from forest and land fires in Sumatra and Kalimantan contributed to theunhealthy days
in2004,
2005 and 2006(Sansuddin,
20 lO;
MdYusof,
2009 and
Department
of EnvironmentMalaysia,
2005 and2007).
For the other years, theunhealthy days
were causedby
industrialactivities,
emissions from motor vehicles and openburning
fromshifting agriculture
activities.14
12
en
10
� en
"'d
<o-.
o '"'
8
<I) .D
§
Z 6
4
2
O .
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
-PRI O 1 O 7 6 O O 1 O O
...KCH O 8 O O O 9 O O l O
lRT O O O O 4 O O O O O
3 4 1 4 3 l O O O
O l O 1 12 8 O O O O
O O O O O O O O O O
O O O O 5 5 O O l O
Figure
1.2 Number ofunhealthy days
for seven selectedsites,
200 l - 20 101.2 PROBLEM STATEMENT
In
Malaysia,
theDepartment
of EnvironmentMalaysia (DOE, Malaysia)
is the governmentbody responsible
formonitoring
airquality
inMalaysia. Department
ofEnvironment
Malaysia
monitorscontinuously through
52 stations located inurban,
suburban,
industrial areas and abackground
area. Thesemonitoring
stations are located instrategic
locations to detect anysignificant change
of airquality. Malaysia
and othercountries have
guidelines
for allowable levels of airpollutant
in the air(Department
ofEnvironment
Malaysia Malaysia, 2012).
InMalaysia
this is known as theMalaysia
Ambient Air
Quality
Guidelines(MAAQG).
In theseguidelines
the threshold value ofPMlO
for asafe level is at 150ug/m
' per 24 houraveraging
times and50Jlg/m3
per year.Short term and chronic human health may occur when the concentration levels of air
pollutant
exceed the airquality guidelines (QUARG,
1996 ; Lee etaI.,
20 lO).
Nasir etal., (1998) reported
in 1997(haze episode
inMalaysia)
the estimatednegative
effect tohealth for asthma attacks was
285,277
cases, there were118,804
cases of bronchitis in children and 3889 cases in adults. and inaddition, respiratory hospital
admissions(2003 cases)
and emergency room visit(26,864 cases).
World healthOrganization, (1998) reported
thatoutpatient
treatment forrespiratory
disease at KualaLumpur
GeneralHospital
increased from 250 to 800 perday
and foroutpatient
in Sarawak increased between two and three timesduring
the hazeepisode
in 1997. Besidesthat,
Brauer andJamal
(1998)
found that hazeepisode
in 1997 also resulted in the increase ofasthma,
conjuctivitis
and acuterespiratoty
infection.Md
Yusof, (2009)
saidPMlO
canprimarily
cause reduction invisibility by light scattering. Visibility
havesignificant strong
correlation with increases in massconcentration of
nitrate,
elemental carbon element andsulphate (Kim
etaI., 2006).
Therefore,
research on effect ofPM10
to human health and environment has been doneby
researchers worldwide.Thus, particulate
matter(PMlO)
has become achallenge
toMalaysia's
airquality
management. One of the most
important
efforts inPMIO monitoring
is todevelop PMlO forecasting
models. Statisticalmodellings
could offergood insights
inpredicting
futurePMlO
concentration levels inMalaysia.
The aims of thisstudy
are todevelop
andpredict
future PM10 concentration forD+1,
D+2 and D+3.The number of studies for
predicting PMlO
concentration is still limited inMalaysia.
This
study provides
the PM10forecasting
modelsusing
three main methods i.e.regression,
artificial neural network andhybrid
models. The methods that were used inregression
models weremultiple
linearregression (MLR),
robustregression (RR)
andquantile regression (QR),
while feedforwardbackpropagation (FFBP)
andgeneral regression
neural network(GRNN)
were used in artificial neural network.Hybrid
models are combination of
principal
componentanalysis (PCA)
with all fiveprediction
methods i.e.
PCA-MLR, PCA-QR PCA-RR,
PCA-FFBP and PCA-GRNN.This research also
developed
a newpredictive
tool forpredicting
futurePMIO
concentrations in selected areas in
Malaysia
up to threedays
in advance. The models could beeasily implemented
forpublic
healthprotection
toprovide early warnings
tothe
respective populations.
Inaddition,
the models were useful inhelping
authoritiesactuate air
pollution impact preventative
measures inMalaysia.
1.3 OBJECTIVES
The
objectives
of this research aregiven
below:1. To
apply multiple
linearregression,
robust regression andquantile regression
topredict PMlO
concentrations.2. To
apply
artificial neural networktechniques (ANN)
i.e. feedforwardbackpropagation (FFBP)
andgeneral regression
neural network(GRNN)
topredict PMlO
concentrations.3. To create
hybrid
modelsby combining regression
models and ANNmodels with
principal
componentanalysis (Pf.A).
4. To determine the most suitable model for
predicting
future(D+ l,
D+2and D+
3) PMlO
concentrations.5. To
develop
a newpredictive
tool for futurePMlO
concentrationsprediction
inMalaysia.
1.4 SCOPE OF RESEARCH
There are many methods to
develop
models forprediction
of airpollutant
concentration data. The mostcommonly
used in airpollutant modelling
aremultiple
linearregression
and neural network.
Nowadays, hybrid
models have become morepopular
as methodfor
prediction
models. All these methods were used in this research todevelop
andpredict
futurePMJo
concentration forD+l,
D+2 and D+3.Seven stations have been chosen for this research which is
Perai,
lerantut, KualaTerengganu, Seberang Jaya, Nilai, Bachang
andKuching.
Those stations represent four groups thatare industrial area(Perai,
Nilai andKuching),
urban area(Kuala Terengganu
and
Bachang),
sub-urban area(Seberang Jaya)
and abackground
station(Jerantut).
Table 1.3 showthe
monitoring
stations coordinates and basicdescription.
a e . om onng sta IOns coor ma esan escmp IOn
ID
Monitoring
Category
Station Narne CoordinateI!&.
Code .e> Station
CA003 Perai
(PRI) Industry
SekInderawasihKeb Taman EN100°05° 23.4704'23.1977' CA004Kuching (KCH) Industry Depot Ubat, Kuching
N01°33.7696'
E 110°23.3740' CA007 Jerantut
(lRT) Background MMS,
BatuEmbun,
N 03° 58.2482'Jerantut E 102° 20.8891' CA009
Seberang
l ayaSub-urban
Sek.Keb.Seberang Jaya
N 5° 24.4476'(SlY) 2,
Perai E 100° 24.0403'CA010 Nilai
(NLI) Industry
TamanSemarak N 02° 49.3001'(Phase 2),
Nilai E 101° 48.6894'Kuala
Sek.Keb.Chabang
N 5° 20.2341'CA034
Terengganu
UrbanTiga,
Kuala El 03° 9.4564'(KTG) Terengganu
CA043
Bachang (BCG)
Urban Sek.Men. TunTuah,
N02°
12.7850'Bachang
E102°
14.0585'T hI 13 M t t d' t dd ti
In this research future
daily PM10
concentration(PMlO,D+l, PMlO,D+2
andPMlO,D+3)
wereused as
dependent
variable and seven parameters were chosen asindependent variable,
that is relative
humidity (RH),
windspeed (WS
;krnlhr), nitrogen
dioxide(N02
;ppm), temperature (T
;°C), PMlO (ug/rn"), sulphur
dioxide(S02
;ppm)
and carbon monoxide(CO; ppm). Monitoring
records used in this research was obtained from theDepartment
of Environment
Malaysia
from 2001 until 20 10.1.5 THESIS LAYOUT
This thesis consist ofsix
chapters
and a brief outline for eachchapter
are as follows:Chapter
1 discussed the overview of airpollution
inMalaysia, problem
s