A thesis submitted in fulfilment of the requirement for the degree of Master of Computer Science

(1)

DETECTING SYNTACTIC AMBIGUITY IN REQUIREMENTS SPECIFICATION USING NAÏVE

BAYES TEXT CLASSIFICATION ALGORITHM

BY

KHIN HAYMAN OO

A thesis submitted in fulfilment of the requirement for the degree of Master of Computer Science

Kulliyyah of Information and Communication Technology International Islamic University Malaysia

JANUARY 2019

(2)

ii

ABSTRACT

Requirements engineering is the process of collecting software requirements from stakeholders, defining user expectations for a new product and resolution of requirements problems such as incompleteness, inconsistency and ambiguity of Software Requirements Specification (SRS). Ambiguities in SRS are considered as one of the main problems because one might interpret more than one way and multiple might interpret different interpretations as it might lead to confusion, waste efforts and time consumption. There are many types of ambiguity, which are lexical, syntactic, semantic, and pragmatic. This research focuses on sentence structure and grammar, which is called syntactic ambiguity. Three categories of approaches to detect ambiguities in requirements specification are manual approach, semi-automatic approach using natural language processing techniques and semi-automatic approach using machine learning techniques. Nonetheless, the manual approach requires a lot of efforts, human experts, time consumption and produce low detection rate of defects.

On the other hand, some of the natural language processing techniques cannot be used in practical as well as produce misleading output in detecting ambiguity in SRS.

Hence, the aim of this research is to apply semi-automatic approach using machine learning Naïve Bayes (NB) text classification technique based on n-gram modeling to detect syntactic ambiguities in Software Requirements Specification (SRS) because NB perform well and accurate in detecting ambiguity. In addition, the finding of this work also proved that NB text classifier achieved (80%) higher accuracy than manual approach (27%) in detecting syntactic ambiguity in SRS.

(3)

iii

ثحبلا ةصلاخ

ABSTRACT IN ARABIC

ديدتحو ،ينينعلما صاخشلأا نم بسالحا جمارب تابلطتم عجم ةيلمع يه تابلطتلما ةسدنه لحو ديدلجا جتنملل ينمدختسلما تاعقوت لإا مدع وأ لامتكلاا مدع لثم تابلطتلما لكاشم

قاست

ادحاو تابلطتلما ديدتح مهف في ضومغلا برتعي و .يمجبرلا جتنلما تابلطتم ديدتح في ضومغلا وأ ح ةيسيئرلا لكاشلما مهأ نم ي

ةدع دوجو عم و ةقيرط نم تركبأ دحاولا رملأا رسفي دق ءرلما نأ ث

رسفي دق صاخشأ عوضولما سفن

اعم لىإ رادهإ و مهفلا شيوشت لىإ يدؤي دق امم ةفلتمخ ةدع ن

ةيمجعلما تادرفلما ضومغ اهنمو عاونأ ةدع لىإ ضومغلا فينصت نكيم و .دوهلمجا و تقولا أ

و

لإا بارع أ نياعلما ةيللاد و أ

و لا تادرفلما بيكترب صالخا ضومغلا ىلع ثحبلا اذه زكري و .ةيفيظو

ا ضومغلبا ىعدي يذلاو وحنلا و ةلملجا لإ

بيارع أ .يوحنلا و و

قرط نم فانصأ ةثلاث كلانه

ةقيرطلاو ةيوديلا ةقيرطلا :يهو تابلطتلما ديدتح في ضومغلا فشك ةيلآ هبشلا

بإ تاينقت مادختس

ةقيرطلاو ةيعيبطلا ةغلل ةلجاعلما هبشلا

آ بإ ةيل ميلعت تاينقت مادختس ةللاا

. ثيحو أ ةيوديلا ةقيرطلا ن

تابرخ و ةيربك تادوهمج بلطتت لإ فيعض لدعم جاتنإو تقولل كلاهتسا و ةيناسنإ

فاشتك

.بويعلا نأ امك

ةيعيبطلا ةغللا ةلجاعم تاينقت ضعب تسيل

يلمع ة مادختسلإا في يطعت اضيأو

تن جئا في ةللضم كإ

ديدتح في ضومغلا فاشت لما

،كلذل .ةيمجبرلا تابلطت نإف

نم فدلها

ةقيرطلا قيبطت وه ثحبلااذه هبشلا

بإ ةيلا مادختس ديدتح في ضومغلا فاشتكلإ ةللاا ةغل

( بولسأ رايتخإ تم دقف ثم نم و ةيمجبرلا تابلطتلما Naïve Bayes

( جذونم ىلع ادمتعم ) n-

Gram ةقد تزرحأ ةعبتلما ةقيرطلا نأ ثحبلا جئاتن ترهظأ دقف هيلع ءانبو.صوصنلا فينصتل )

لصت لىإ 80 ةيوديلا ةقيرطلا تزرحأ امنيب % 27

بيارعلإا ضومغلا فاشتكا ديدتح في %

ةيمجبرلا تابلطتلما ديدحتل

.

(4)

iv

APPROVAL PAGE

I certify that I have supervised and read this study and that in my opinion; it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis for the degree of Master of Computer Science.

………

Azlin Nordin Supervisor

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a thesis for the degree of Master of Computer Science.

………

Siti Asma Binti Mohammed Internal Examiner

………

Nurul Akmar Emran External Examiner

This thesis was submitted to the Department of Computer Science and is accepted as a fulfilment of the requirement for the degree of Master of Computer Science.

………

Amelia Ritahani Ismail

Head, Department of Computer Science

This thesis was submitted to the Kulliyyah of Information Communication Technology and is accepted as a fulfilment of the requirement for the degree of Master of Computer Science.

………

Abdul Wahab Abdul Rahman Dean, Kulliyyah of Information Communication Technology

(5)

v

DECLARATION

I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.

Khin Hayman Oo

Signature ... Date ...

(6)

vi

COPYRIGHT

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH

DETECTING SYNTACTIC AMBIGUITY IN REQUIREMENTS SPECIFICATION USING NAÏVE BAYES TEXT

CLASSIFICATION ALGORITHM

I declare that the copyright holders of this thesis are jointly owned by the student and IIUM.

No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below

1. Any material contained in or derived from this unpublished research may only be used by others in their writing with due acknowledgement.

2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieved system and supply copies of this unpublished research if requested by other universities and research libraries.

By signing this form, I acknowledged that I have read and understand the IIUM Intellectual Property Right and Commercialization policy.

Affirmed by Khin Hayman Oo

……..……….. ………..

Signature Date

(7)

vii

DEDICATION

This thesis is dedicated to my family

(8)

viii

ACKNOWLEDGEMENTS

Firstly, it is my utmost pleasure to dedicate this work to my dear parents and my family, who granted me the gift of their unwavering belief in my ability to accomplish this goal: thank you for your support and patience.

A special thanks to my parents for their continuous support, encouragement and leadership, and for that, I will be forever grateful.

I also wish to express my appreciation and thanks to Assist. Prof. Dr Azlin Nordin and Assoc. Prof. Dr Amelia Ritahani Ismail as well as Dr.Suriani Sulaiman provided their times, efforts and supports for this research project.

Once again, I glorify Allah (S. W. T) for His endless mercy on us one of which is enabling us to successfully round off the efforts of writing this thesis.

Alhamdulillah.

(9)

ix

CHAPTER TWO: LITERATURE REVIEW ... 11

2.1 Introduction ... 11

2.1.1 Manual Approaches in Detecting Ambiguity ... 11

2.1.1.1 Inspection Techniques ... 12

2.1.1.2 Review Techniques ... 13

2.1.2 Semi-Automatic Approaches using Natural Language Processing ... 15

2.1.2.1 Natural Language Patterns (Patterns Matching Techniques) ... 15

2.1.2.2 Statistical Machine Translation using N-Gram Model ... 19

2.1.3 Semi-Automatic Approaches using Machine Learning ... 21

2.1.3.1 Decision-Tree Text Classification Techniques ... 22

2.1.3.2 Support Vector Machine (SVM) Text Classification Techniques ... 24

2.1.3.3 Naïve Bayes (NB) Text Classification Techniques ... 26

2.2 An Analysis of Ambiguity Detection Techniques for Software Requirements Specification (SRS) ... 30

2.3 Identifying Heuristic Rules Based on Ambiguity Handbook and Quality Evaluation of Software Requirements Specification ... 33

2.4 Naïve Bayes (NB) Text Classifier Algorithm ... 40

2.5 Representing N-Gram With Examples and Applying Parsing With Part of Speech Tagging (POS) Tagging ... 43

(10)

x

CHAPTER THREE: METHODOLOGY ... 49

3.2 Research Methodology... 49

3.2.1 Literature Review Analysis ... 50

3.2.2 Rules Identification ... 52

3.2.3 Machine Learning Experiment ... 52

3.2.4 Survey ... 54

3.2.5 Evaluate Results ... 54

CHAPTER FOUR: EXPERIMENTAL SET UP ... 56

4.2 Research Experimental Set Up... 56

4.2.1 Collect Data Set ... 56

4.2.2 N-gram Model ... 58

4.2.3 Parsing with Part of Speech (POS) Tagging ... 59

4.2.4 The Machine Learning Experiment ... 61

CHAPTER FIVE: SURVEY TO DETECT AMBIGUITY ... 66

5.2 Survey Process ... 66

5.2.1 Design Questionnaire ... 67

5.2.2 Planning ... 67

5.2.3 Conduct the Pilot Study ... 68

5.2.4 Update the Questionnaire ... 70

5.2.5 Conduct the Real Survey ... 70

5.2.6 Analyze Results ... 71

5.3 Threats to the Validity ... 72

CHAPTER SIX: RESULTS AND ANALYSIS ... 74

6.2 Detecting Syntactic Ambiguity using N-Gram Based NB Text Classifier ... 74

6.3 Detecting Syntactic Ambiguity Using Survey ... 79

6.4 Evaluation Processes ... 80

6.4.1 Q1 What is the average accuracy results for both survey and NB text classifier? ... 82

6.4.2 Q2 What is the average accuracy for the survey results of the ambiguity rules? ... 83

6.4.3 Q3 Which technique is more effective for detecting syntactic ambiguity in SRS in terms of time? ... 83

6.5 Discussion ... 84

(11)

xi

CHAPTER SEVEN: CONCLUSION AND FUTURE WORK ... 88

7.2 Research Contribution ... 88

7.3 Future Work ... 90

REFERENCES ... 91

APPENDIX A: AMBIGUOUS TERMS REPOSITORY ... 95

APPENDIX B: THE TRAINING DATA SET FOR EXPERIMENT ... 96

APPENDIX C: INVITATION LETTER FOR SURVEY ... 100

APPENDIX D: PILOT STUDY INSTRUMENT ... 102

APPENDIX E: SURVEY INSTRUMENT ... 107

APPENDIX F: CALCULATION ... 116

APPENDIX G: AMBIGUOUS TERMS REPOSITORY REFERENCES... 119

(12)

xii

LIST OF TABLES

Table 2.1 Categories of the Ambiguity Detection Techniques 30

Table 2.2 Unigram 46

Table 2.3 Bigram 46

Table 2.4 Trigram 46

Table 2.5 Quadrigram 46

Table 2.6 POS Tagging is Applied to Sentence 1 47

Table 2.7 POS Tagged to Each Sentence of Quadrigram 47

Table 2.8 Example of Training Data Set of Quadrigrams 47

Table 4.1 N-Gram for Tokenization Process 59

Table 4.2 Applied POS Tagging to Sentence 2 60

Table 4.3 POS Tagged to Each Word of Sentence of 6-Grams 60

Table 4.4 The Example of Training Data 60

Table 4.5 Removing Words from Training Data 61

Table 4.5.1 Replaced multiple rules with “XX” 61

Table 4.6 Eliminating Words 62

Table 5.1 Demographic Background for Pilot Study 69

Table 5.2 Pilot Study Results 70

Table 6.1 Demographic Background for Survey 80

Table 6.2 Evaluation Results for Calculation 82

Table 6.3 Accuracy results for both NB text classifier and survey 83 Table 6.4 6-Gram Based POS Patterns for Ambiguity Rules 85 Table 6.5 Accuracy and Performance Between Manual and NB Text

Classifier 86

(13)

xiii

LIST OF FIGURES

Figure 3.1 The Research Methodology Diagram 50

Figure 3.2 Ambiguous Terms Repository Flow 51

Figure 4.1 Research Experimental Design Setup Diagram 57

Figure 4.2 The Sample Output 64

Figure 5.1 Survey to Detect Ambiguity 67

Figure 5.2 The Snapshot of the Pilot Study Instrument 69

Figure 5.3 The Snapshot of the Survey Instrument 71

Figure 6.1 Rule Four Result 75

Figure 6.2 Other Result 76

Figure 6.3 Rule One Output from Rule One and Rule Four in a Sentence 77 Figure 6.4 Rule Three Output from Rule Three and Rule Five in a

Sentence 78

Figure 6.5 Rule Three Output 78

(14)

xiv

LIST OF ALGORITHM

Algorithm 1 The Pseudocode for NB Text Classifier Algorithm 63

(15)

1

CHAPTER ONE INTRODUCTION

1.1 OVERVIEW

This chapter is divided into seven sections, which are background of the study, research problems, research questions, research objectives, significant of research, scope and limitation and chapter summary.

1.2 BACKGROUND OF THE STUDY

Requirements Engineering (RE) is a process of gathering, analyzing, specifying and the validation of user requirements (Hussain, Ormandjieva and Kosseim, 2007; A.

Nigam, Arya, B. Nigam and Jain, 2012). Requirements gathering is one of the important stages of software development process. The software requirements are gathered with (functional and non-functional requirements), constructed as a document called Software Requirements Specification (SRS). SRS is very important in software project because the successful software project is mainly depended on SRS that assist as an input to design, coding and testing phases of software project (Hussain, Ormandjieva and Kosseim, 2007; A. Nigam, Arya, B. Nigam and Jain, 2012). Moreover, SRS documents serve as a medium to communicate user requirements to the technical personal responsible for developing the software project (Hussain et al., 2007). Most of the time, Natural Language (NL) is used to specify SRS because NL is easy to understand for all stakeholders, but it is fundamentally ambiguous in general (Kamsties, Berry and Paech, 2001; Polpinij and Ghose, 2008;

Gleich, Creighton and Kof, 2010; Yang, Willis, Roeck and Nuseibeh, 2010; Sharma, Bhatia & Biswas, 2014).

(16)

2

As mentioned by Yang, Willis, Roeck and Nuseibeh (2010), ambiguity means when one word is translated differently by different readers. Kamsties, Berry and Paech (2001) and Bussel (2009) observed that sometimes, ambiguous words are not recognizable, so users would be dealing with the ambiguous words without knowing it. This will cause serious problems that leads to failure in behaving as intended by users in software design and system as well as failure in interacting properly with components in requirement specification.

There are four types of ambiguity in the context of requirements are (1) lexical, (2) syntactic, (3) semantic, (4) pragmatic (Berry, Kamsties and Krieger, 2003;

Kiyavitskaya, Zeni, Mich, and Berry, 2008; Gleich, Creighton and Kof, 2010; Singh and Saikia, 2015).

(1) Lexical ambiguity appears when two or more possible meanings in a single word. (Berry, Kamsties and Krieger, 2003; Kiyavitskaya, Zeni, Mich, and Berry, 2008; Gleich, Creighton and Kof, 2010; Singh and Saikia, 2015). For example; “Green”: the word green has more than one meaning such as “green in colour” or “immature”.

(2) Syntactic ambiguity is known as structure ambiguity and occurs when words in a sentence are interpreted in more ways than one because of ambiguous sentence structure (Berry, Kamsties and Krieger, 2003;

Kiyavitskaya, Zeni, Mich, and Berry, 2008; Gleich, Creighton and Kof, 2010; Singh and Saikia, 2015). For instance; “Small Car Factory”, which has two meaning; “(small car) factory” or “(small) car factory”.

(3) Semantic ambiguity happens when a sentence can be translated into more than one way within its context (Berry, Kamsties and Krieger, 2003;

Kiyavitskaya, Zeni, Mich, and Berry, 2008; Gleich, Creighton and Kof,

(17)

3

2010; Singh and Saikia, 2015). For example; “All citizens should have a social security number”. That sentence can be translated into two ways:

“every citizen has an individual social security number” or “all citizens have same social security number”.

(4) Pragmatic ambiguity ascends when a sentence is not described in detail and a given context’s information is missing to express its meaning (Berry, Kamsties and Krieger, 2003; Kiyavitskaya, Zeni, Mich, and Berry, 2008; Gleich, Creighton and Kof, 2010; Singh and Saikia, 2015). For instance; “Do you want to have a cup a tea?”. There are two ways to interpret the sentence: “Do you feel a desire to a cup or coffee? Or “I can make you a cup of coffee if you want”.

This research investigated syntactic ambiguity in SRS because syntactic ambiguity is focusing on sentence structure and grammar while other types of ambiguity such as lexical, semantic, pragmatic are discussing on the meaning of the words in sentences. In addition, the sentence structure and grammar are very important in requirements documents because most of the ambiguities found are from syntactic in nature due to multiple ambiguous words in the requirements documents (Kamsties, Berry and Paech, 2001).

Many techniques have been used to detect ambiguity in SRS such as Kamsties, Berry and Paech (2001); Anda and Sjoberg (2002); Popescu, Rugaber, Medvidovic and Berry (2007); Alshazly, Elfatatry and Abougabal (2014); Singh and Saikia (2015) used human for detecting ambiguity in SRS. Moreover, Brown, Pietra, DeSouza and Lai (1992); Osborne and MacNish (1996); Denger, Berry and Kamsties (2003); Chater and Manning (2006); Lin, Church, Sekine, Yarowsky, Bergsma and Narsale (2010);

Clark, Giorgolo and Lappin (2013); Misra and Das (2013); Arora, Sabetzadeh, Briand

(18)

4

and Zimmer (2014) used human and natural language processing techniques;

however, Romano and Palmer (1998); Nakagawa and Matsumoto (2002); Hussain, Ormandjieva and Kosseim (2007); Moschitti, Riccardi and Raymond (2007); Polpinij and Ghose (2008); Polpinij (2009); Seijas and Segura (2009); Khan, A, Baharudin, Lee, Khan, K (2010); Wang, Dong and Yan (2012); Clark, Giorgolo and Lappin (2013); Rashwan, Ormandijieva and Witte, (2013); Subha and Palaniswami (2013);

Sharma, Bhatia and Biswas (2014); Gogoi and Sarma (2015); Rajeswari, Juliet and Aradhana (2017) used human and machine learning techniques for detecting ambiguity in SRS.

Therefore, it can be concluded that there are three categories of approach in detecting ambiguity in SRS: manual approach, semi-automatic approach using natural language processing and semi-automatic approach using machine learning based on the previous work. The detailed explanation of these three approaches are discussed in Section 2.1.

According to Polpinij and Ghose (2008); Wang and Agichtein (2010); Sharma et al., (2014); Allahyari-Abhari et al., (2015), one of the semi-automatic approach using machine learning technique: Naïve Bayes (NB) text classification technique achieved high accuracy and performed well in detecting ambiguity in SRS. Therefore, this research applied NB text classifier to detect syntactic ambiguities in SRS.

Hussain, Ormandjieva and Kosseim. (2007) presented that text classification is a method to identify text documents into more than one groups, which depends on the different features. NB is able to classify not only documents but also email- classification, spam detection, clustering and organization documents. NB text classifier model is built based on the word-probability and word count method. It also classified a text through a class based on the words, which appears in the text content

(19)

5

by dealing with the probabilities of the words learned from a training data (Polpinij and Ghose, 2008).

This research applied NB text classifier based on the N-gram because NB text classifier achieved high accuracy and performed well in detecting ambiguity in SRS based on the previous studied. The NB text classifier classified data based on word- probability or word-count method, the input should be modeled using n-gram in order to get influences of words before and after. An n-gram is a consecutive sequence of tokens. N-gram can be described as a frame with the length N, which moves over the text. The n-gram size one is called “unigram”, size two is called “bigram”, size three is called “trigram”, size four is called “quadrigram” and size five and more are called

“n-gram” (Maroulis, 2014). Part of Speech (POS) tagging has been used in many Natural Language Processing (NLP) tasks because POS tagger tags each word of the sentence, which shows categories of words such as [NN(noun), VB(verb), JJ(adjective)] in sentence (Gleich, Creighton, & Kof, 2010; Nigam, Arya, Nigam, &

Jain, 2012; Subha & Palaniswami, 2013). POS tagging is also called POS, word classes, morphological classes or lexical tags to use in language processing as well as it can also be used in information retrieval, shallow parsing, information extraction, linguistic research for corpora and higher level of NLP tasks like parsing, semantics and translation (Hasan, UzZaman and Khan, 2007; A. Nigam, Arya, B. Nigam and Jain, 2012 ). POS tagging is able to apply not only English, but also other languages as well (Hasan, UzZaman and Khan, 2007; A. Nigam, Arya, B. Nigam and Jain, 2012 ). Furthermore, this research detected syntactic ambiguity in English natural language so that parsing with POS tagging is used. The reason of using parsing with POS tagging in this research is that parsing is derived from POS and parsing, which determines sentence structure because it shows the structures of phrases. The detailed

(20)

6

discussion of n-gram and parsing with POS tagging with examples are provided in Section 2.5.

Hence, this research applied NB text classification based on n-gram and parsing with POS tagging.

1.3 RESEARCH PROBLEM

Natural Language (NL) has been used in constructing SRS, but NL requirements are generally ambiguous (Kamsties, Berry and Paech, 2001; Polpinij and Ghose, 2008;

Gleich, Creighton and Kof, 2010; Yang et al., 2010; Sharma, Bhatia & Biswas, 2014).

The quality of SRS significantly influences on the design flow and the final product of the software project positively and negatively (Allahyari-Abhari et al., 2015). Hence, the lower quality of SRS can easily take place due to ambiguity, which may lead to error in a software design (Allahyari-Abhari et al., 2015).

There are also many approaches available in detecting ambiguity in SRS.

Based on human reviewers, only 18% of ambiguity were detected when a team of three reviewers spent 4.5 hours in detecting ambiguities in requirements documents (Kamsties et al., 2001). By using human inspector, more ambiguities were found;

however, most of the numbers of ambiguities were insignificants using ad hoc technique (Anda and Sjøberg, 2002). Ad hoc technique is not systematic, which means it does not provide any instructions on how to detect ambiguity and it depends on human judgment (Anda and Sjøberg, 2002). When the ad hoc technique could not find more ambiguities, human inspectors continuously detected ambiguities in SRS by using checklist technique. Even inspectors spent a lot of time in detecting ambiguities no more ambiguities were found using checklist technique (Anda and Sjøberg, 2002).

The checklist is used to identify single requirement (one sentence of the requirements)

(21)

7

ambiguity based on previous identified ambiguity types and it is also not systematic (does not provide any instruction) (Kamsties, Berry and Paech, 2001; Popescu, Rugaber, Medvidovic and Berry 2007; Alshazly, Elfatatry and Abougabal, 2014;

Singh and Saikia, 2015). The detailed explanation of inspection and review techniques are provided in Section 2.1.1. Hence, human judgment in detecting ambiguity not only require a lot of human efforts, time and expertise but also could produce low detection rate of defects.

Furthermore, there are also disadvantages of using some of the Natural Language Processing (NLP) techniques in detecting ambiguity in SRS. For examples, (i) the patterns matching processing of NLP techniques cannot be used in practical without a tool or can be used semi-automatic the use of patterns in detecting ambiguity in SRS (Denger et al., 2003). (ii) The output of N-gram model of NLP technique produced misleading results in detecting ambiguities in SRS (Maroulis, 2014). The detailed explanation of an analysis of ambiguity detection techniques in SRS is provided in Section 2.2.

1.4 RESEARCH QUESTIONS

The research questions of this work are as follows:

1. What are the terms in requirements that may lead to syntactic ambiguity?

2. Can Naïve Bayes (NB) text classifier algorithm based on n-gram model be supported to detect syntactic ambiguity in SRS?

3. How effective is the proposed technique in terms of accuracy and performance in detecting syntactic ambiguities in SRS?

(22)

8 1.5 RESEARCH OBJECTIVE

The research objectives of this work are as follows:

1. To investigate syntactic ambiguity in requirements specification;

2. To apply NB text classifier algorithm based on n-gram model in detecting syntactic ambiguity in SRS;

3. To compare the accuracy and performance between manual and semi- automatic techniques in detecting syntactic ambiguity in SRS.

1.6 SIGNIFICANCE OF THE RESEARCH

RE is an important stage of Software Engineering (SE) and it starts with the general understanding of requirements to produce software product. RE provides consistent and complete set of the requirements in a regular format and also involves requirements elicitation, analysis, documentation and specification. Finally, Software Requirements Specification (SRS) contains the description of the system at the end of the RE stage (Subha and Palaniswami, 2013).

According to Polpinij and Ghose (2008), the survey of over 8000 projects assumed that 350 US companies discovered about one third of projects would never be completed. Moreover, one half of the projects were succeeded partially due to lots of budget required for overruns, delays and incomplete functionalities. All of these above failures were leading due to poor requirements which is the major problems, absence of user participation (13%), requirements incompleteness (12%), changing requirements (11%) followed by impractical prospects (6%) and vague objectives (5%). Furthermore, one of the similar surveys of 350 US companies, 3800 organizations in 17 countries of the European also summarized that majority of the software problems are leading to requirements specification, which is more than 50%

(23)

9

and requirements management is 50%. The above matters pointed out that requirements problems are on a large scale. Generally, errors in RE stage are dangerous and costly. Hence, requirements engineers suggested that the best way to solve the problems through ambiguous requirements because ambiguity is the main problems of software errors (Polpinij & Ghose, 2008).

This research proposes to detect syntactic ambiguity in SRS by using NB text classification technique based on n-gram model. In addition, this research compares NB text classification technique with human judgments in terms of time and accuracy in identifying syntactic ambiguity of requirements sentences for their effectiveness.

Therefore, this research could benefit requirements engineers, practitioners and researchers to reduce ambiguities in SRS. Additionally, the NB algorithm could also be integrated with other requirements tools.

1.7 SCOPE AND LIMITATION

This research is mainly focusing on detecting syntactic ambiguities on the English natural language of requirements statements because most of the requirements sentences in SRS are constructed in English natural language. Moreover, this research also identified the heuristic rules (based on the previously identification), which are (i) Handbook of From Contract Drafting to Software Specification: Linguistic Source of Ambiguity (Berry, Kamsties and Krieger, 2003) and (ii) Quality Evaluation of Software Requirements Specification (Fabbrini, Fusani, Gnesi, and Lami, 2001), which were fitted into machine learning algorithm in order to use as part of the training data. Additionally, these heuristic rules are very basic and important in English language grammar. Finally, a total 15 existing SRS from undergraduate

(24)

10

students’ RE class projects were used as part of the training and testing data for the machine learning experiment.

1.8 CHAPTER SUMMARY

In brief, this chapter has presented the background of the study and also explained why SRS is important, what are the problems in SRS and how requirements engineers solved problems in SRS. This chapter mentioned types of ambiguity in SRS as well as why syntactic ambiguity is chosen for this research among other types of ambiguity.

The different types of technique were identified for detecting ambiguities in SRS based on the previous work analysis. Besides, the research problems were discussed in order to find out the weaknesses of the existing ambiguity detection techniques. The research questions and objectives were also discussed. The significance of the research and the scope and limitation were mentioned at the end. In the subsequent chapter, the literature review of techniques in detecting ambiguities in SRS is provided.

Kulliyyah of

A thesis submitted in fulfilment of the requirement for the degree of Master of Computer Science

DETECTING SYNTACTIC AMBIGUITY IN REQUIREMENTS SPECIFICATION USING NAÏVE

ABSTRACT IN ARABIC

DECLARATION

COPYRIGHT

DEDICATION

TABLE OF CONTENTS

CHAPTER TWO: LITERATURE REVIEW ... 11

CHAPTER THREE: METHODOLOGY ... 49

CHAPTER FIVE: SURVEY TO DETECT AMBIGUITY ... 66

CHAPTER SEVEN: CONCLUSION AND FUTURE WORK ... 88

APPENDIX F: CALCULATION ... 116

LIST OF FIGURES

LIST OF ALGORITHM

1.3 RESEARCH PROBLEM

1.4 RESEARCH QUESTIONS

8 1.5 RESEARCH OBJECTIVE

1.6 SIGNIFICANCE OF THE RESEARCH

1.7 SCOPE AND LIMITATION