• Tiada Hasil Ditemukan

A thesis submitted in fulfillment of the requirement for the degree of Doctor of Philosophy in Computer Science

N/A
N/A
Protected

Academic year: 2022

Share "A thesis submitted in fulfillment of the requirement for the degree of Doctor of Philosophy in Computer Science "

Copied!
24
0
0

Tekspenuh

(1)

SOCIAL NETWORKS EVENT MINING FRAMEWORK FOR PEACEBUILDING APPLICATION

BY

MUNIBA MEMON

A thesis submitted in fulfillment of the requirement for the degree of Doctor of Philosophy in Computer Science

Kulliyyah of Information and Communication Technology International Islamic University Malaysia

April 2017

(2)

ii

ABSTRACT

Peace provides the freedom to express our views, to relate with other people, to create cooperation facilities, whereas social networks (SNs) provide a platform to do so. SNs can play a significant role in improving peacebuilding (Pb) process as recent peace- related studies witness that, peace and crisis reports are communicated by different SNs. People and victims of crisis usually utilize SNs and its applications to transmit their feelings. However, the most important setback of these SNs is to maintain the enormous amount of SNs data and to extract topic specific information. There is a lack of sufficient research on Pb process using social network event mining (SNEM) approach. Therefore, the objective of this research is to propose a framework, design and implement the framework; to extract peace related events from the oceanic data of SNs; to analyze user sentiments about that event and to cluster events further into sub-event. The framework is based on three proposed algorithms 1) data extraction and emerging event detection algorithm; 2) sentiment analysis (SA) algorithm; and 3) Clustering and tag cloud algorithm. To implement the proposed framework, this research has come up with the algorithms, that has been transformed into an application; named as TEMiner. This research has automatically extracted the specific Pb events (tweets) from millions of tweets posted per day, processed the sentiments of users about the events and cluster data based on identified events using TEMiner application. Furthermore, visualization techniques have been used for representing the results from different perspective and to provide user friendly graphical user interface that helps in quick analysis of the results. Thus, experimental results have proven that the proposed framework is viable to apply in extracting real time events, clustering them into sub-events, and analyzing user sentiments based on the extracted specific event, topic, or product.

(3)

iii

ثحبلا ةصلاخ

تاكبش َّنأ ينح في ،نواعتلا روسج ءاشنلإ ،رخلآا عم لصاوتلا نم اننكتم تيلاو ،يأرلا نع يربعتلا ةيرح ملاسلا رفوي َّنإ .رملأا كلذ ليعفتل ًارطُأ مدقت يعامتجلاا لصاوتلا ًرود بعلت نأ اهنكيم يعامتجلاا لصاوتلا تاكبش

في اًيربك ا

تامزلأاو ملاسلا ريراقت َّنأ ديفت نهارلا تقولا في ملاسلبا ةقلعتلما تاساردلا ةلجاعم ةقيرط َّنإو .ةيملسلا ةينبلا ريوطت تامزلأا يااحضو ةماع سانلاف .يعامتجلاا لصاوتلا تاكبش نم ةفلتمخ ةعوممج برع لقتنت ةجرلحا ام ةداع ةصاخ

مهأ َّنإف كلذ عمو .مهروعش نع يربعتلاو مهرظن ةهجو لقن ضرغب اتهاقيبطتو ةيعامتجلاا تاكبشلا كلت نومدختسي جارختسا الهلاخ نم نكيم تيلاو تناايبلا نم ةلئاه ةعوممج ةحتاإ في لثمتت ةيعامتجلاا تاكبشلا كلت تاقافخإ لياتلباو ،ةصاخ تاعوضوبم ةقلعتلما تامولعلما ىلع لمعت تيلا ةيبحرلا يرغو ةيبحرلا تاسسؤلما نم يرثكلا كانه َّنإف

اذه ىلع زكرت ةيفاك ًثًوبح كلتنم لا انلز لا هنأ لاإ .لكاشلما كلت ىلع بلغتلل ةددلمحا تاعوضولما كلت تايجيتاترسإ .ةيعامتجلاا تاكبشلا هذه هنمضتت ام ليلحتل تيامولعلما بيقنتلا جهنم مادختسبا رملأا اذه نم فدلها َّنإف هيلعو

تناايبلا ىلع ءانب ملاسلبا ةقلعتلما ثادحلأبا ؤبنتلل ؛راطلإا اذه ذيفنتو ،ميمصتو ،راطإ حاترقا في لثمتي ثحبلا ثادحلأا عجمو ،ثادحلأا كلت لوح مدختسلما روعش ليلتح لجأ نمو ،ةيعامتجلاا تاكبشلا هذه في ةلئالها ةيرهولجا .حطسلا ىلع ةرهاظلا يرغ :يهو ،ةحترقم تايمزراوخ ثلاث ىلع راطلإا اذه دمتعيو

1 - فشكلاو تناايبلا طابنتسا

،ةيمزراوخ ةقيرطب اهنع ةتجانلا ثادحلأا نع 2

- تايمزراولخا كلت قيرط نع رظنلا تاهجوو رعاشلما ليلتح 3

-

ثحبلا اذه َّنإف حترقلما راطلإا ةحص نم ققحتللو .ءادوسلا ةميغلا تايمزراوخو داقتعلاا نم ةعومجبم اًعوفشم تييأ

( ــب ىمسلما قيبطتلا للاخ نم اهيمدقت َّتم تيلا ،تايمزراولخا TEMiner

ةقيرطب ثحبلا ةنيع جارختسا َّتم دقو .)

تتم َّثم ،دحاو موي للاخ تتم تيلاو ،تريوت في تاديرغتلا ينيلام نم ،ةدوصقلما ةيئاوشعلا ةنيعلا دعاوق قفو ةيكيتاموتوأ ا رعاشم ةلجاعم ( قيبطت مادختسبا ةددلمحا ثادحلأا كلتب ةقلعتلما مهرظن تاهجوو ينمدختسلم

TEMiner .)

في ةفلأ رثكأ تاموسر ميمصتو ،فلتمخ روظنم نم جئاتنلا ليثمتل لُّيختلا تاينقت مادختسا َّتم دقف كلذ لىإ ةفاضإو اتن تتبثأ دقف ،اذكهو .جئاتنلل عيرس ليلتح ىلع دعاست تيلاو مدختسلما ةهجاو لحاص حترقلما راطلإا َّنأ براجتلا جئ

مدختسلما رظنلا تاهجو ليلتحو ،ةيعرفلا ثادحلأا للاخ نم اهعيمتجو ،يلعفلا اهتقو في ثادحلأا طابنتسا في قيبطتلل .جتنم وأ عوضوم وأ ددمح ثدح طابنتسا ىلع ًادامتعا

ABSTRACT IN ARABIC

(4)

iv

APPROVAL PAGE

The thesis of Muniba Memon has been approved by the following:

_____________________________

Norsaremah Salleh Supervisor

_____________________________

Lili Marziana Abdullah Co-Supervisor

_____________________________

Azzeddine Messikh Internal Examiner

_____________________________

………

External Examiner

_____________________________

Saim Kayadibi Chairman

(5)

v

DECLARATION

I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.

Muniba Memon

Signature ... Date ...

(6)

vi

COPYRIGHT PAGE

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH

SOCIAL NETWORKS EVENT MINING FOR PEACEBUILDING APPLICATION

I declare that the copyright holders of this thesis are jointly owned by the student and IIUM.

Copyright © 2017 Muniba Memon and International Islamic University Malaysia. All rights reserved.

No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below

1. Any material contained in or derived from this unpublished research may be used by others in their writing with due acknowledgement.

2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieved system and supply copies of this unpublished research if requested by other universities and research libraries.

By signing this form, I acknowledged that I have read and understand the IIUM Intellectual Property Right and Commercialization policy.

Affirmed by Muniba Memon

……..……….. ………..

Signature Date

(7)

vii

ACKNOWLEDGEMENTS

Firstly, it is my utmost pleasure to dedicate this work to my dear parents and my family, who granted me the gift of their unwavering belief in my ability to accomplish this goal: thank you for your support and patience.

The completion of my Ph.D. has been an arduous and wonderful journey. First and foremost, I would like to thank my adviser Dr. Norsaremah Salleh. She supported and helped me more times than I can count, always responded quickly to my questions, and stayed in contact. Her encouragement and advices made me produce a quality research work. She was supportive though the paper writing processes, rejections, editing, and re-submissions. She is a wonderful professor, researcher, and friend. Thanks Dr. Norsaremah for always being with me in all my difficult time and in this PhD journey. I may not be able to complete this degree without your support.

Furthermore, I would like to thank my co-advisor Dr. Lili Marziana for her valuable guidance in completion of the following thesis. I am also grateful to my thesis committee chairman Dr. A. Wahab for his support. I would like to thank my friends, especially Dr. Ikhlas F. Zamzami, Dr. Adamu Abubakar and Yaqoob Koondhar for their constant motivation so that I could successfully complete the following work. Finally, I would like to thank my family, who supported me through almost 4 years of higher education.

(8)

viii

TABLE OF CONTENTS

Abstract ... ii

Abstract in Arabic ... iii

Approval Page ... iv

Declaration ... v

Copyright Page ... vi

Acknowledgements ... vii

Table of Contents ... viii

List of Tables ... xi

List of Figures ... xii

List of Acronyms ... xiv

List of Publications ... xv

List of Poster Papers ... xvi

CHAPTER ONE: INTRODUCTION ... 1

1.1 Background and Basic Concept ... 1

1.1.1 Peacebuilding ... 3

1.1.2 Social Networks ... 5

1.1.3 Sentiment Analysis ... 6

1.1.4 Content Analysis ... 6

1.2 Statement of The Problem ... 7

1.3 Research Objectives... 8

1.4 Research Questions ... 8

1.5 Research Contributions ... 9

1.6Thesis Organization ... 10

CHAPTER TWO: A SYSTEMATIC LITERATURE REVIEW OF SOCIAL NETWORKS EVENT MINING FOR PEACEBUILDING APPLICATIONS ... 11

2.1 Introduction... 11

2.2 Systematic Literature Review process ... 12

2.2.1 SLR Planning Phase (Protocol of SLR) ... 13

2.2.2 Execution / Conducting of SLR Process ... 14

2.2.3 Write- Up of SLR Process and Results ... 15

2.3 SLR’s Research Questions ... 15

2.4 Identification of Relevant Literature ... 18

2.4.1 Strategy Used to Derive Search Terms... 18

2.4.2 The Searching Process ... 20

2.4.3 Documentation of Search Process or References ... 23

2.5 Study Selection Criteria ... 23

2.5.1 Inclusion Criteria ... 23

2.5.2 Exclusion Criteria ... 24

2.6 Data Extraction Strategy ... 24

2.6.1 Data Extraction Form ... 25

2.7 Study Quality Assessment Checklist ... 26

2.8 The Results of The Review... 27

(9)

ix

2.8.1 Synthesis of Evidence ... 28

2.9 Discussion ... 38

2.10 Conclusions ... 38

CHAPTER THREE: RESEARCH METHODOLOGY ... 40

3.1 Introduction... 40

3.2 Design Science Research Methodology ... 41

3.2.1 Applications of DSR Methodology In Computer Science Domain ... 41

3.3 Design Science Research Process Model ... 43

3.3.1 DSR Process Steps ... 44

3.3.2 Knowledge Flows ... 47

3.3.3 Output... 48

3.4 Application of Design Science Research Method to this Research ... 49

CHAPTER FOUR: ALGORITHMS FOR DATA EXTRACTION, SENTIMENT ANALYSIS AND EMERGING EVENT DETECTION FROM TWITTER STREAMS ... 57

4.1 Introduction... 57

4.2 Proposed Data Extraction Algorithm... 57

4.2.1 Tweet Loader ... 59

4.2.2 Emerging Event Detection ... 61

4.2.3 Advance Tequery Boolean Search Query ... 63

4.3. Data Pre-processing for Events Detection ... 64

4.3.1 Pre-processing ... 64

4.3.2 Sliding Window Technique ... 65

4.4. Sentiment Analysis ... 66

4.4.1 Anew Dictionary Based Approach ... 66

4.5 Sentiment Analysis Approach ... 68

4.5.1 The DBSCAN Clustering... 70

CHAPTER FIVE: EXPERIMENTAL PROCEDURE AND PRESENTATION OF RESULTS ... 73

5.1 Introduction... 73

5.2 Data Extraction ... 74

5.3 Pre-Processing and Data Cleaning ... 78

5.4 Proposed Twitter Event Sentiment Analyzer ... 80

5.5 Presentation of Results ... 85

CHAPTER SIX: CONCLUSIONS AND FUTURE WORK ... 88

6.1 Conclusion ... 88

6.2 Future Research Direction ... 92

6.2.1 Automatic Twitter Event Authentication System ... 93

6.2.2 Visual Summary Of Evolving Events By Timelinereview Feature ... 94

6.2.3 Extend Teminer To Other Social Networks ... 94

6.3 Final Remaks ... 95

REFERENCES ... 96

(10)

x

APPENDIX-A: Data Extraction Form ... 105

APPENDIX-B: Studies used in SLR ... 106

APPENDIX-C: Sample Twitter Dataset ... 120

APPENDIX-D: Published Work from this Thesis ... 144

(11)

xi

LIST OF TABLES

Table No. Page No.

1 Summary of PICOC for formation of Research questions. 16

2 Summary of derived search terms from PICOC 19

3 Search Term derived from keywords found in papers (sorted by

relevance) 19

4 SearchTerms derived based on synonym words 20

5 Concatenation of alternative terms using Boolean OR 20

6 Concatenation of all possible terms by using Boolean AND 20

7 Details of the Search Engine Results. 22

8 Procedure for documenting the search process, adopted from

(Norsaremah Salleh, 2008) 23

9 Study Quality Checklist 26

10 List of SNs used in studies 29

11 List of context or settings used 33

12 List of Dataset available in English 35

13 List of data set as per each language 36

14 List of Data Collection Tools or API’s 36

15 Design Evaluation Methods (Alan R. Hevner, et al. 2004) 46

16 Process steps with outputs 49

17 Expected Outcomes of SNEM for Pb using DSR 56

18 Tweet-attributes saved in TMine. 61

19 Advance TEQuery Boolean search Filter 64

(12)

xii

LIST OF FIGURES

Figure No. Page No.

1 Overview of Systematic Literature Review process 13

2 Results of the paper screening process (selected relevant 145 /1396

papers) 27

3 Design Science Research Process Model (Vaishnavi & Kuechler,

2015) 43

4 Research Design Model SNEM 50

5 Proposed TEMiner Application's algorithmic Framework. 58

6 TLoader process of TEMiner Application. 59

7 Subset of Tweet attributes of Tweet-status field in JSON format8,16. 60 8 TEQuery process of TEMiner Application for Emerging Event

Detection. 61

9 Basic TEQuery Boolean query User Interface of TMiner App. 62 10 List of all Active TEQuery: Classification of tweets into events. 63 11 Advance TEQuery Boolean query User Interface of TMiner App. 64

12 Sentiment Map. 67

13 Sub-Event Detection and Sentiment Analysis Approach (clustering). 70

14 TEMiner Clustering process. 71

15 TEMinner using basic searching strategy 75

16 TLoader panel to show past and on-going event quries of TEMiner 75

17 Event Searching based on keyword 76

18 Basic Method to detect Tweet Event based Location. 76

19 Basic Method to detect Tweet Event 77

20 Advanced Tweet-Event detection method with Boolean operators 77

21 Advanced Tweet-Event detection method 78

(13)

xiii

22 Tweet Event database with filtered attributes 79

23 Clean TweetText database as per event detected. 79

24 Sentiment Map of TESA algorithm. 82

25 The ANEW dictionary database 84

26 Sentiment visualization of TEMiner App. 86

27 Clustering Visualization of TEMiner App. 86

28 Tag Cloud Term visualization of TEMiner App. 87

29 Timeline view of tweet events 87

(14)

xiv

LIST OF ACRONYMS

Pb Peacebuilding

SNs Social Networks

SNEM Social Networks Event Mining

SA Sentiment Analysis

CA Content Analysis

SLR Systematic Literature Review DSR Design Science Research TEMiner Twitter Event Miner

TESA Twitter Event Sentiment Analyzer

TD Topic Detection

DBSCAN Density-based spatial clustering of applications with noise

(15)

xv

LIST OF PUBLICATIONS

Published Papers

1 Paper 1: Shaikh M., Salleh N., and Marziana L. 2014. Social Networks Peacebuilding Event Mining: A Systematic Literature Review. In proceeding of IEEE (CPS), 3rd International Conference on Advanced Computer Science Applications and Technologies, ACSAT 2014, pages (119-124). [Scopus Indexed]

2 Paper 2: Shaikh M., Salleh N., and Marziana L. 2015. Social Networks Event Mining: A Systematic Literature Review. Advances in Intelligent Systems and Computing. Vol. 355, pages (169-177). [Scopus indexed]

3 Paper 3: Shaikh M., Salleh N., and Marziana L. 2015. Social Networks Content Analysis for Peacebuilding Application. Advances in Intelligent Systems and Computing. Vol. 355, pages (193-200). [Scopus indexed]

4 Paper 4: Muniba Shaikh, Humaira Dar, Asadullah Shaikh, and Asadullah Shah.

Adjusted Edit Distance Algorithm for Alias Detection. In International Conference on Information and Knowledge Management (ICIKM) 2012, pages 118-122.

(16)

xvi

LIST OF POSTER PAPERS

1. Poster 1: Muniba Shaikh and Asadullah Shah (2013), An Integrated Social Communication System for Peacebuilding, POSTER presentation in International Islamic University Malaysia (IIUM) KICT Postgraduate Student Colloquium, December 2013, Kuala Lumpur, Malaysia. (This POSTER won the Silver Medal in the event).

2. Poster 2: Norsaremah Salleh, Muniba Shaikh, Lili Marziana. Poster Id-296. An Integrated Social Communication System for Peacebuilding. International Research, Invention and Innovation Exhibition (IRIIE) 2014. Organized By:

International Islamic University Malaysia (IIUM), Malaysia, 11-12 June, 2014.

(17)

1

CHAPTER ONE INTRODUCTION

1.1 BACKGROUND AND BASIC CONCEPT

Peace provides the freedom to express our views, to relate with others people, and to create co-operation facilities. To do so, Social Networks (SNs) provide an important platform. SNs, for example, Facebook and Twitter, become a significant source of sharing cutting-edge information and an immense medium to investigate the types of events (news) of a broad audience (Popescu, A.M., Pennacchiotti, 2010). Twitter is a well-known real-time micro blogging application that allows its users to share short information, not more than 140 characters, which is known as “tweet”. Users normally write tweets to express their opinions on a variety of topics related to their daily lives (Lai, 2010). Twitter network has been widely expanded and accepted by users in the last few years (Olariu, 2012). Report shows that in March 2012, Twitter gained more than 340 million posts every day posted by its 140 million active users1, comparing with September 2010 where the average posts were about 90 million per day.

Facebook reports that they have more than 900 million active users2. On the other hand, Tumblr receives about 70 million posts in each day3. Among the microblogging platforms, Twitter receives higher number of posts per day and the scientific research focuses more on Twitter for data analysis because it provides the facility to extract the post whereas on Facebook, most posts are private. As the Twitter posts (tweets) are restricted to 140 characters, it greatly influences the mode of writing of its users.

1 http://blog.twitter.com/2012/03/twitter-turns-six.html

2 http://mashable.com/2012/04/23/facebook-now-has-901-million-users/

3 http://www.tumblr.com/press

(18)

2

Internet slang, misspelled words, and abbreviations are commonly used in tweets. The writing style is mostly informal. For instance, the tweet “Aww :) I didt show i had a voicemail til i turned my phone off and on again smh RT @anonymous_user:

#youknowitsloveif I left you a voicemail” contains a retweet (reposting of others’ post starting with the “RT” keyword), a reference to a different user (by using the symbol

“@”), an emoticon (by using the characters “:)” ), an acronym (“smh” as the short form of “shaking my head”), a hash-tag (Twitter specific tag, marked by a number sign “#”), and several errors (such as, “didt”, “til”, and lowercased “I”).

In response to this growing influx of information, number of web applications popped up with several directions of summarizing and extracting the most significant news. As an example, Summify4 generate the summary of daily news from SNs and notifies to the user on their specified topics by sending emails to them. That help users to monitor their most relevant news from the billions of tweets posted daily on twitter.

Extracting significant and meaningful information from the microblogging streams are also came in focus in many scientific studies. Several algorithms have been proposed in different studies to detect the highlights of an event such as sports, games, elections, natural disasters (J. Nichols, et.al. 2012; D. Chakrabarti et.al., 2011; H.

Takamura et.al., H., 2011) or to generate a description for a written hashtag (#) (B.

Sharif, et. al. 2010). To the best of our knowledge, no study has been found that extract, cluster and visualize Pb events with respect to the location, timeline and by user's sentiments perspective.

Therefore, this research undertakes the problem of clustering any type or category of streams, not only the ones filtered on criteria. The approach aggregates an

4 http://summify.com/

(19)

3

event detection module with a clustering and sentiment analysis algorithms. The strategies that presented in this research are possible to be incorporated (customized) into a SNEM system, which is capable to analyze data extracted from Twitter as the source, to detect opinion and cluster crisis, health, weather, or product's tweets, depending upon the application.

The major contribution of this research is to combine event mining, text clustering, and sentiment analysis to develop a simple yet efficient algorithm for the visualization of microblogging streams in event clusters with sentiments. To the best of our knowledge, this is a novel Social Network Event Mining (SNEM) algorithm that does not necessitate any restrictions or prior information about the analyzed stream to generate a comprehensive output. The solution is evaluated based on manually tagging of Twitter posts that were collected automatically by the developed application to highlight the improvement.

1.1.1 Peacebuilding

Peacebuilding (Pb) has been practiced since past six decades by the United Nation (UN) peacekeeping and other organizations of different countries in many conflict- torn nations all over the world (Nations, 2008). Although the Pb term does not have any specific or precise definition, there are several definitions of the term Pb as presented in the literatures. Every field practitioners, organizations, scholars, and policymakers have their own opinions and conceptions regarding Pb and its entailment on tasks, tools, and techniques (Nations, 2008). According to UN peacekeeping department, Pb includes a variety of measures that address the core issues of society that can trail of or lead to turn in conflicts later. It is a long term process to sustain peace and develop by providing all means to resolve deep-rooted

(20)

4

structural causes of intensive conflicts in a comprehensive manner (Nations, 2008).

Furthermore, UN's definition for Pb is as follows: "Peacebuilding involves a range of measures targeted to reduce the risk of lapsing or relapsing into conflict by strengthening national capacities at all levels for conflict management, and to lay the foundation for sustainable peace and development. Peacebuilding is a complex and long-term process of creating the necessary conditions for sustainable peace. It works by addressing the deep-rooted, structural causes of violent conflict in a comprehensive manner. Peacebuilding measures address core issues that affect the functioning of society and the State, and seek to enhance the capacity of the State to effectively and legitimately carry out its core functions (Nations, 2008)".

Some organizations explain Pb as an overarching concept which includes early warning and response efforts, advocacy work, violence prevention, military intervention, civilian and military peacekeeping, humanitarian assistance, conflict transformation, ceasefire agreements, and the establishment of peace zones (Nations, 2008). However, Dambach, CEO of Alliance for Pb, explains, "Peacebuilding is the set of initiatives taken by different actors in government and civil society to address the main causes of violence and protect civilians before, during, and after the violent conflict. Peacebuilders use communication, negotiation, and mediation instead of belligerence and violence to resolve conflicts. Effective Peacebuilding is multifaceted and adapted to each conflict environment. There is no path to peace, but pathways are available in every conflict environment. Peacebuilders help belligerents to find a path that will enable them to resolve their differences without bloodshed. The ultimate objective of Peacebuilding is to reduce and eliminate the frequency and severity of violent conflict (Dambach, 2012)".

(21)

5 1.1.2 Social Networks

Social Networks (SNs) are one of the major communication channels nowadays.

Millions of people are using SNs to meet, connect, share knowledge and ideas, and to share experiences (D. M. Boyd & Ellison, 2007). The users of the well-known and well-liked SNs, for example Instagram, Myspace, Facebook, or Google+ are mostly use these networks for entertainment or for sharing content, and news (Matuszka, Vinceller, & Laki, 2013).

Previous researches indicate that the users of most SNs use the platform to stay in touch with friends, to share pictures. By this way the usage or scopes of SNs are becoming narrow (D. Boyd, 2006; Ibrahim & Salim, 2013). On the other side, microblogging SNs, like Twitter, Tumblr are predominantly used for campaigning and communication (Liao et al., 2012). The appearance of dedicated and specialized SNs, such as LinkedIn, is primarily used by professionals, indicating that social networking can provide value added services to different users in uncountable ways (M. Shaikh, Salleh, & Marziana, 2015b).

User participates in SNs to share ongoing activities, events, personal information, photos, and participate in trendy discussions on some topics (Williams, &

Durrance, 2008). For any individuals, SNs is involved with the way to find people's opinion or sentiment about daily lives, commercial products, movies, elections, and pursue for regular updates of events within a short time (Pang & Lee 2008;

Hodeghatta, U. R. 2013; Skoric, et. al 2012; Alejandro, J. 2011). Each of these networking technologies, like Facebook, Twitter, Myspace, or LinkedIn, is still developing and growing hastily because of the innumerable opportunities and advantages that are brought to people, businesses, and organizations.

(22)

6 1.1.3 Sentiment Analysis

Sentiment analysis (SA) or opinion mining provides useful insight into online communication as it enables researchers to measure emotion in online texts (Ibrahim

& Salim, 2013). Information that obtained by analyzing sentiments is used by social psychologists who are interested in mining opinions, moods, and attitudes, and in market intelligence, products comparison, opinion summarization, web-blog attitude analysis, classification of E-Mail sentiment, and sentiment filtering of online message (S. Ramaswamy, 2011). SA is a two-step process that includes; 1) Subjectivity classification, and 2) Sentiment classification. Subjectivity classification is an essential part of SA to distinguish subjective sentences from objective. This process removes the sentences (words) that present facts, and select the sentences (words) that express opinions or sentiments (Ibrahim & Salim, 2013). The next step after subjectivity classification is the detection of sentiment polarity of the subjective sentences, this process is known as sentiment classification (Medvet & Bartoli, 2012).

Subjective sentences reveal positive or negative polarity, and normally indicate the user’s sentiment (Spiliopoulou, Mobasher, Nasraoui, & Zaiane, 2012).

1.1.4 Content Analysis

Content Analysis (CA) is a set of procedures which draw a valid inference from the text (Weber, 1990; Shaomei Wu 2011). The overall purpose of CA is to extract context from the content that is being analyzed by knowing the fact, that is “Who (says) What (to) Whom (in) what Channel and (with) What Effect?” (Lasswell, 1948).

Different researchers define CA in different ways depending on their domain or field of research; as Holsti defines CA as, “a technique for making inferences by

(23)

7

objectively and systematically identifying specified characteristics of messages”

(Holsti, 1969).

Some general CA definitions that accurately describe this research are, CA is a

"systematic, replicable data reduction technique, compressing many words of text into content categories based on explicit rules of coding" (Stemler, 2001). According to Reitz et. al., CA is a "Close analysis of explicit and implicit messages of a text through classification and evaluation of key concepts, symbols, and themes to determine meaning and explain its effect on the audience” (Reitz, 2004).

1.2 STATEMENT OF THE PROBLEM

SNs become today's fastest media of communication and they play a very significant role in spreading news and information throughout the globe within a short time. The major setbacks of these SNs are to manage the huge amount of data, to store them in efficient way, and to analyse them in a fruitful manner. Another issue is the ability to extract important topic from specific (Pb related) data and pre-processing or understanding them in effective way. This is because, SNs data is usually written in informal way, for example, uses of emoticons, abbreviations, hashtags, URLs, short forms, and typically not following any grammar etc. There is no sufficient and fruitful research conducted on social network content analysis from the Pb perspectives.

Therefore, the objective of this research is to perform sentiment analysis, to identify opinion of people, and to perform content analysis in describing "which (SN), what (data), and how (to extract)" and clustering to manage information for easy retrieval, analysis, and management. Furthermore, this research will help to answer the questions regarding features and techniques that should be used for content analysis of Pb related data.

(24)

8

This research ultimately aims to propose data extraction and content analysis framework in managing SNs content, clustering, and classifying data depending on the topic and nature (positivity, negativity, threat, etc.) that will help in Pb applications and organizations to provide early warnings to people in case of any threat.

1.3 RESEARCH OBJECTIVES

This research aimed to achieve the following objectives:

1. To investigate the state-of-art of social networks content analysis studies involving the conflict or violence affected communities for the purpose of Pb.

2. To develop automatic data collection and cleaning methods that will be able to collect relevant data from several SNs.

3. To create our own Repository for storing the collected relevant data from open source SNs to train the classifier for future uses.

4. To perform sentiment analysis in order to draw the context from SNs content that will help in Pb process.

1.4 RESEARCH QUESTIONS

The research questions that need to be covered by this study are as follows:

1. How Pb concept have been defined and categorized in events as reported by SNs?

2. Which SNs has been widely used for mining events in Pb context and why?

3. How to extract user defined topic specific events from SNs?

Rujukan

DOKUMEN BERKAITAN

91 4.7 Accuracy, Specificity and Sensitivity for epileptic seizure and normal EEG 92 4.8 Confusion Matrix for Detection of Ictal S and Inter-ictal N Classes 94 4.9

This study takes the perspective of a focal individual into account and attempts to find the set of a key individual in each of his/her networks using

The current research is focused on combined two techniques for features extraction and selecting an effective features in order to generate effective minimum features

Although the Egypt Arbitration Law of 1994 marked a significant evolution in commercial arbitration in the Arab Republic of Egypt, the current position of setting aside an

On the auto-absorption requirement, the Commission will revise the proposed Mandatory Standard to include the requirement for the MVN service providers to inform and

The main purpose of this study is to derive the features and core principles of a curriculum model for an Islamic-based teacher education programme (IBTEC)

Mouse Resource Browser (MRB)is a database for the search and acquirement of mouse resource information from more than 200 mouse resources that are further divided into 33 different

DISSERTATION SUBMITTED IN FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE MASTER OF SCIENCE.. INSTITUTE OF BIOLOGICAL SCIENCE FACULTY