DISASTER RECOVERY WITH MINIMUM REPLICA PLAN
BY
MOHAMMAD MATAR ALSHAMMARI
A thesis submitted in fulfilment of the requirement for the degree of Doctor of Philosophy in Information Technology
Kulliyyah of Information and Communication Technology International Islamic University Malaysia
NOVEMBER 2018
ii
ABSTRACT
Cloud computing has emerged as a new paradigm for hosting and delivering computing resources over the Internet. The cloud has become a dominant and preferred method to store large amounts of data and enable the sharing of that data among several users. It also enables the use of pay-as-you-go pricing models. Today’s cloud computing environment has required data centers to increase the amount of available storage. There are two main concerns with cloud storage: data reliability and cost of storage. This research proposed a data replication management in multi-cloud approach that determine the number of replicas (which should be less than 3) which reduce the cloud storage consumption while meeting the data reliability requirements.
Furthermore, it proposed a preventive approach for data backup and recovery aiming at minimizing the number of replicas and ensure high reliability for data before the disaster. The approach named Preventive Disaster Recovery Plan with Minimum Replica (PDRPMR) which is a cost-effective mechanism to reduce the number of replications in the cloud to be 1 or 2-replicas only without compromising the data reliability. The name PDRPMR originates from its preventive action checking of the availability of replicas and monitoring of denial of service attacks to maintain data reliability. Several experiments have been carried out to demonstrate that PDRPMR reduces the amount of storage space used by one third to two-thirds compared to typical 3-replicas replication strategies, which in turn reduces the cost of storage.
These two metrics have been used most frequently in the literature. In this thesis we focused on the critical factors that influence the Disaster Recovery (DR) plan including, minimizing storage cost, reducing Recovery Time Objective (RTO), ensuring high reliability rate and decrease the number of replicas to be less than 3 (typical number of replicas).
iii
ثحبلا ةصلاخ
ABSTRACT IN ARABIC
باحس تزرب ة
فاضتسلا ديدج جذومنك ةبسولحا ة
تحبصأ دقو .تنترنلإا برع ةيبوسالحا دراولما ميلستو
قيرط ةباحسلا ة
نميهم ة لضفمو ة يبك تايمك نيزختل ة
دع ينب تناايبلا هذه لدابت ينكتمو تناايبلا نم ة
تلا جذانم مادختسا حيتي هنا امك .ينمدختسم .راعسلأبا ةعوفدلما يعس
باحس مويلا ة
ئيبلا ةبسولحا ي ة
يمك ةديازل تناايبلا زكارم ةبولطلما ة
نثا كانه .ةحاتلما نيزختلا ا
باحس عم ةيسيئرلا لغاوشلا نم ن ة
:نيزختلا ةيقوثوم
فلكتو تناايبلا ة
ثحبلا اذه حترقاو .نيزختلا إ
راد ة جنه في تناايبلل لثامتلما خسنلا
تبااحسلا ةددعتم نم لقا نوكت نا يغبني تيلا( ةلثامتلما خسنلا ددع ددتح تيلا
3 )خسن نم للقت تيلاو
باحس كلاهتسا ة
يبلت ينح في نيزختلا ة
تابلطتم ةيقوثوم
ةولاعو .تناايبلا ىلع
،كلذ ثحبلا حترقا
لحا لىإ ةدلقلما خسنلا ددع ليلقت لىإ فدهي اهدادترساو تناايبلل يطايتحلاا خسنلل ايئاقو اجنه نىدلأا د
نامضو ةيقوثولما
جهنلا ناكو .ةثراكلا عوقو لبق تناايبلل ةيلاعلا ىمسلما
طخ ة ثراوكلا نم فياعتلا
نىدلأا دلحا عم ةيئاقولا لل
وه يذلا( ةلثامتلما ةخسن ةيلا
لاعف ة تاراركتلا ددع نم دحلل ةفلكتلا ثيح نم
في لا باحس ة نوكتل 1 وأ 2 لل ساسلما نود طقف ةلثامتلما خسن بم
ةيقوثو نم زاهلجا مسا عبنيو .تناايبلا
ةظفاحملل ةمدلخا نم نامرلحا تامجه دصرو ةدلقلما خسنلا رفاوت نم ققحتلبا ةقلعتلما ةيئاقولا هتاءارجإ ىلع ةيقوثوم دع تيرجأ دقو .تناايبلا
ة يمك نم للقي زاهلجا نا تابثلإ براتج ة
ةينيزختلا ةحاسلما
بسنب ةمدختسلما ة
ثلا لىإ ثلثلا ينثل
لثامتلما خسنلا تايجيتاترسا عم ةنراقلمبا ة
نم نوكتت تيلا ةيجذومنلا 3
خسن
،ةلثامتم فلكت اهرودب ضفتخ تيلاو
ة مظعم في يننثا سيياقلما هذه تمدختسا دقو .نيزختلا
.ةقباسلا تاساردلا هذه في
ةحورطلأا نازكر
ىلع رثؤت تيلا ةسمالحا لماوعلا ىلع
طخ ة ثراوكلا نم فياعتلا
في ابم فلكت ليلقت كلذ ة
،نيزختلا نم دلحاو
لا تقو لا شاعتن
،فدلها لدعم عافترا نامضو
ةيقوثولما
نم لقا نوكتل ةلثامتلما خسنلا ددع ضفخو 3
خسنلل يجذومنلا ددعلا(
لما
.)ةلثامت
iv
APPROVAL PAGE
The thesis of Mohammad Matar Alshammari has been approved by the following:
_____________________________
Ali A. Alwan Supervisor
_____________________________
Azlin Nordin Co-Supervisor
_____________________________
Norsaremah Salleh Internal Examiner
_____________________________
Siddeeq Yousif Ameen External Examiner
_____________________________
Jafreezal Jaafar External Examiner
_____________________________
Saim Kayadibi Chairperson
v
DECLARATION
I hereby declare that this thesis is the result of my own investigations, except where otherwise stated. I also declare that it has not been previously or concurrently submitted as a whole for any other degrees at IIUM or other institutions.
Mohammad Matar Alshammari
Signature ... Date ...
vi
COPYRIGHT
INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA
DECLARATION OF COPYRIGHT AND AFFIRMATION OF FAIR USE OF UNPUBLISHED RESEARCH
DISASTER RECOVERY WITH MINIMUM REPLICA PLAN
I declare that the copyright holders of this thesis are jointly owned by the Student and IIUM.
Copyright © 2018 Mohammad Matar Alshammari and International Islamic University Malaysia. All rights reserved.
No part of this unpublished research may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise without prior written permission of the copyright holder except as provided below
1. Any material contained in or derived from this unpublished research may only be used by others in their writing with due acknowledgement.
2. IIUM or its library will have the right to make and transmit copies (print or electronic) for institutional and academic purposes.
3. The IIUM library will have the right to make, store in a retrieved system and supply copies of this unpublished research if requested by other universities and research libraries.
By signing this form, I acknowledged that I have read and understand the IIUM Intellectual Property Right and Commercialization policy.
Affirmed by Mohammad Matar Alshammari
……..……….. ………..
Signature Date
vii
DEDICATION
This thesis is dedicated to my family
viii
ACKNOWLEDGEMENTS
First of all, I am gratified with the core of my heart to Almighty Allah who made it possible to complete this thesis.
I must acknowledge my work to my dear mother, wife and my family. Without their support, concern, and love, it was impossible for me to complete my Ph.D.
studies. I especially thank my wife who encouraged me to pursue my Ph.D.
I am also grateful to my supportive supervisor Assist. Prof. Dr. Ali A. Alwan and co-supervisor Assoc. Prof. Dr. Azlin Nordin Salleh who have continuously encouraged me throughout my research. I am especially thankful to my main supervisor Dr. Ali who guided me with great patience and keep persuaded me during my research. He made me learn many things as I enrolled in the Ph.D. program at IIUM with a weak research background. Thank you very much, Dr. Ali, for being my supervisor and mentor.
Finally, I wish to express my appreciation and thanks to those who provided their time, effort and support for this project. To the members of my thesis committee, thank you for sticking with me.
ix
TABLE OF CONTENTS
Abstract ... ii
Abstract in Arabic ... iii
Approval Page ... iv
Declaration ... v
Copyright ... vi
Dedication ... vii
Acknowledgements ... viii
List of Tables ... xii
List of Figures ... xiii
List of Abbreviations ... xv
CHAPTER ONE: INTRODUCTION ... 1
1.1 Overview ... 1
1.2 Problem Statement ... 3
1.3 Research Questions ... 5
1.4 Research Objectives ... 5
1.5 Research Scope ... 6
1.6 Research Significance ... 7
1.7 Organization of the Thesis ... 8
CHAPTER TWO: BACKGROUND AND LITREATURE REVIEW ... 10
2.1 Introduction ... 10
2.2 Cloud Computing Overview ... 10
2.2.1 Definition of Cloud Computing ... 12
2.2.2 Essential Characteristics of Cloud Computing ... 13
2.2.3 Service Models of Cloud Computing ... 14
2.2.4 Deployment Models of Cloud Computing ... 16
2.3 Disaster Recovery Overview... 17
2.3.1 Definition of Disaster Recovery ... 19
2.3.2 Types of Disaster Recovery ... 20
2.3.3 Importance of Disaster Recovery ... 22
2.3.4 Issues with Disaster Recovery ... 23
2.4 An Overview of Disaster Recovery in Cloud ... 24
2.4.1 Importance of Disaster Recovery in the Cloud ... 25
2.4.2 Issues with Disaster Recovery in the Cloud ... 26
2.4.3 Advantages and Disadvantages of Disaster Recovery in the Cloud ... 29
2.5 Data Reliability in the Cloud... 29
2.6 Data Disaster Recovery in the Cloud ... 31
2.6.1 Traditional Data Disaster Recovery ... 31
2.6.2 Data Disaster Recovery in the Cloud ... 32
2.6.3 Data Disaster Recovery Models in the Cloud ... 32
2.7 Previous Approaches of Disaster Recovery ... 34
2.8 Existing Studies on Disaster Recovery in the Cloud ... 36
2.9 Previous Works of Data Reliability in the Cloud ... 43
x
2.10 Previous Approaches of Data Management in Cloud ... 46
2.11 Previous Works on Data Disaster Recovery in the Cloud ... 49
2.12 Summary ... 51
CHAPTER THREE: RESEARCH METHODOLOGY ... 52
3.1 Introduction ... 52
3.2 Methodology of the Research ... 53
3.3 Data Replication Management in a Multi-Cloud ... 56
3.4 Data Backup and Recovery in Multi-Cloud ... 58
3.5 Performance Measurements ... 59
3.6 Cloud Simulator ... 60
3.6.1 CloudSim ... 60
3.6.2 CloudAnalyst ... 61
3.7 Implementation ... 64
3.8 Summary ... 65
CHAPTER FOUR: PROPOSED APPROACH FOR DATA REPLICATION MANAGEMENT IN MULTI-CLOUD ... 66
4.1 Introduction ... 66
4.2 Proposed Approach of Data Replication Management in Multi- Cloud ... 66
4.2.1 Proactive Replica Checking ... 68
4.2.2 Overview of the PDRPMR ... 69
4.2.3 Working Procedure of the PDRPMR ... 73
4.3 Optimization Algorithms in the PDRPMR ... 76
4.3.1 Minimum Replication Algorithm ... 77
4.3.2 Metadata Distribution Algorithm ... 79
4.3.2.1 The maximum capacity of the PDRPMR ... 79
4.3.2.2 Provision of Sufficient Data Reliability Assurance ... 80
4.4 Summary ... 82
CHAPTER FIVE: PROPOSED DATA BACKUP, RECOVERY AND SCHEDULING IN MULTI-CLOUD ENVIRONMENT ... 83
5.1 Introduction ... 83
5.2 System Model of the Proposed Approach ... 84
5.2.1 Architecture of the Proposed Approach ... 85
5.2.2 Data Backup Model ... 85
5.2.3 Data Recovery Model ... 90
5.3 Scheduling Strategy of the Proposed Approach... 93
5.4 Summary ... 97
CHAPTER SIX: PROPOSED SYSTEMS IMPLEMENTATION AND EVALUATION ... 98
6.1 Introduction ... 98
6.2 System Architecture of the Simulation ... 98
6.3 Experimental Settings ... 101
6.3.1 Experiment Evaluation Metrics ... 102
6.3.2 Experiment Evaluation Scenarios ... 102
6.3.3 Simulation Configuration ... 103
xi
6.4 Experimental Results and Analysis ... 104
6.4.1 Data Replication Results ... 105
6.4.1.1 Cost-Preferred Strategy ... 105
6.4.1.2 RTO-Preferred Strategy ... 107
6.4.1.3 Results Discussion for Data Replication ... 109
6.4.2 Data Backup and Recovery Results ... 111
6.4.2.1 Cost for 1 and 3-Replicas ... 112
6.4.2.2 Cost for 2 and 3-Replicas ... 113
6.4.2.3 RTO for 1 and 3-Replicas ... 114
6.4.2.4 RTO for 2 and 3-Replicas ... 115
6.4.2.5 Results Discussion for Data Backup and Recovery ... 116
6.5 Summary ... 118
CHAPTER SEVEN: CONCLUSIONS AND FUTURE WORK ... 119
7.1 Research Summary... 119
7.2 Conclusions of Research ... 119
7.3 Contribution of Research ... 121
7.4 Future Work ... 122
REFERENCES ... 125
LIST OF PUBLICATIONS ... 133
xii
LIST OF TABLES
Table 2.1 Standards Platform Recovery 20
Table 2.2 The Events Categories 21
Table 2.3 Advantages and Disadvantages of Disaster Recovery in the
Cloud 29
Table 2.4 Summary of Previous Approaches of Disaster Recovery in the
Cloud 42
Table 2.5 Summary of Previous Work on Data Disaster Recovery in the
Cloud 51
Table 4.1 Types of Metadata 71
Table 6.1 Simulation Setup Requirements 101
Table 6.2 Parameter Settings of CPs 103
Table 6.3 Simulation Parameters 104
Table 6.4 Latency Matrix Values (ms) 104
Table 6.5 Bandwidth Matrix Values (Mbps) 104
Table 6.6 Cost and RTO Results Comparison in Cost-Preferred Strategy 106 Table 6.7 Cost and RTO Results Comparison in RTO-Preferred Strategy 108
Table 6.8 Performance Values of Cost and RTO 112
xiii
LIST OF FIGURES
Figure 2.1 Cloud Computing Fundamentals 13
Figure 2.2 Layers of Cloud services 16
Figure 2.3 Deployment Model of Cloud Computing 16
Figure 2.4 Comparison Between Traditional and Cloud DR Models 18 Figure 2.5 Recovery Point Objective & Recovery Tine Objective 34
Figure 2.6 The Model of Disaster Recovery System 37
Figure 2.7 Typical Deployment Scenario 38
Figure 2.8 Disaster-CDM architecture 39
Figure 2.9 Deployment Architecture of Optimal 39
Figure 2.10 Framework of Disaster Recovery Assistance 40
Figure 2.11 Data Backup Process 44
Figure 2.12 Data Recovery Process 45
Figure 2.13 PRCR Architecture 46
Figure 2.14 Procedure Between DBMS/CSP 48
Figure 3.1 Methodology of the Research 56
Figure 3.2 The Proposed Mechanism Architecture 57
Figure 3.3 The Proposed Framework of Disaster Recovery 58
Figure 3.4 CloudSim Architecture 61
Figure 3.5 CloudAnalyst Architecture 63
Figure 3.6 The Process Diagram of Disaster Recovery Model 64
Figure 4.1 PDRPMR Architecture 70
Figure 4.2 Working Process of Proposed Mechanism 74
Figure 4.3 The Algorithm of Minimum Replication Algorithm 78 Figure 4.4 Pseudo Code of Metadata Distribution Algorithm 82
xiv
Figure 5.1 Data Backup Model 87
Figure 5.2 Flowchart of the Data Backup Model 89
Figure 5.3 Data Recovery Model 91
Figure 5.4 Flowchart of the Data Recovery Model 93
Figure 6.1 Flowchart of the Disaster Recovery Model 100
Figure 6.2 The Cost of 1, 2, and 3-Replicas Using the Cost-Preferred
Strategy 106
Figure 6.3 The RTO of 1, 2, and 3-Replicas Using the Cost-Preferred
Strategy 107
Figure 6.4 The Cost of 1, 2, and 3-Replicas Using the RTO-Preferred
Strategy 108
Figure 6.5 The RTO of 1, 2, and 3-Replicas Using the RTO-Preferred
Strategy 109
Figure 6.6 Cost ($) for 1 and 3-Replicas Using 3 Scheduling Strategies
With 500 Tasks 113
Figure 6.7 Cost ($) for 2 and 3-Replicas using 3 Scheduling Strategies
With 500 Tasks 114
Figure 6.8 RTO (ms) for 1 and 3-Replicas Using 3 Scheduling Strategies
With 500 Tasks 115
Figure 6.9 RTO (ms) for 2 and 3-Replicas Using 3 Scheduling Strategies
With 500 Tasks 116
xv
LIST OF ABBREVIATIONS
A/C Alternating Current
Amazon S3 Amazon Simple Storage Service
ARPANET Advanced Research Projects Agency Network
BC Business Continuity
BIA Business Impact Analysis CBF Critical Business Function
CI Checking Interval
CIS Set of Checking Interval values
CP Cloud Provider
CPE Cloud Provider have Enough space CPU Central Processing Unit
CSP Cloud Service Provider DaaS Database as a Service
DAR Data storage, request Allocation and resource Reservation
DC Data Center
DDP-DR Data Distribution Plan for multi-site DR Disaster CDM Disaster Cloud Data Management
DNS Domain Name System
DR Disaster Recovery
DRaaS Disaster Recovery as a Service DR-Cloud Cloud Disaster Recovery DRP Disaster Recovery Plan EC2 Elastic Compute Cloud EHR Electronic Health Record ERP Enterprise Resource Planning
ET Expected Time/Expected Storage Duration
GB Gigabyte
GFS Google File System
GRA Geographical Redundancy Approach GUI Graphical User Interface
HDFS Hadoop Distributed File System IaaS Infrastructure as a Service
IDEMA Impact of Decoupling and Modulation iSCSI Internet Small Computer System Interface
IT Information Technology
JTA Java Transaction API KaaS Knowledge as a Service MAO Maximum Acceptable Outage
MB MegaByte
Mbps Megabits Per Second NAS Network Attached Storage NetDB2 Network Database2
NetDB2-MS Network Database2 Management System NIST National Institute of Standards and Technology NoSQL Non Structured Query Language
xvi
OA & M Operations Administration and Management OLTP Online Transaction Processing
OMNet++ Objective Modular Network Testbed in C++
OS Operating System
OSM Organizational Sustainability Modeling PaaS Platform as a Service
PC Personal Computer
PDRPMR Preventive Disaster Recovery Plan with Minimum Replica PRCR Proactive Replica Checking for Reliability
RMAN Recovery Manager
RPO Recovery Point Objective RTO Recovery Time Objective RTT Round Trip Time
SaaS Software as a Service SAN Storage Area Networks SLO Service Level Objective SMB Small and Medium Business SOA Service-Oriented Architectures SQL Structured Query Language SSP Storage Service Provider
TB TeraByte
VM Virtual Machine
ZB ZettaByte
1
CHAPTER ONE INTRODUCTION
1.1 OVERVIEW
With the rapid growth of Internet technologies, large-scale online services such as data backup and data recovery have increased in recent years. Because these services require substantial networking, processing and storage capacities, it is a critical challenge to design large-scale computing infrastructures that support these services in a cost-effective manner. As a solution, cloud computing has been refined during the past decade and has become an attractive business for organizations that own large datacenters and rent their computing resources (Rimal et al., 2011; Tsai et al., 2010).
Cloud computing delivers numerous benefits, including reduced costs for data storage backup and data accessibility.
The essential cloud characteristic is its ability to store data while ensuring its availability, which is an important feature when storing sensitive information.
However, the rapid development of the scale and complexity of today's cloud services and infrastructures has also revealed important challenges regarding the design of fundamental cloud computing architectures. This is specifically concerning high data reliability requirements and storage costs.
Without considering data reliability, various studies on maintaining data reliability have focused on software. The majority of the proposed solutions suggest that the data must be replicated into at least three copies (3 replicas) to ensure high data reliability (Li et al., 2012; Gu et al., 2014). These replicas can be placed either in one location or distributed over multiple locations. However, this solution incurs high storage costs, consumes significant volumes of storage space, and causes high
2
network traffic, mainly for data-intensive applications in the cloud. Furthermore, the current approaches to data backup and recovery for single-cloud environments require vast amounts of storage space due to the creation of multiple replicas in numerous Data Centers (DCs) (Li et al., 2012; Gu et al., 2014; Sengupta and Annervaz, 2014).
Accordingly, the use of a single-cloud paradigm can generate risks, including hardware faults and software errors, natural disasters, and damage by human interference. These issues can lead to service disruptions or a total loss of data through a system collapse (Li et al., 2012; Gu et al., 2014; Sengupta and Annervaz, 2014).
Cloud computing development is not recommended without considering the risks, which may be particularly pronounced when only one DC is involved. Various Cloud Providers (CPs) address these risks via practical measures, including the geographic dispersion of data. However, DCs in different locations are still operated by a single-cloud service provider. They usually use the same infrastructures and software stacks and have similar or identical operational processes and management teams (Gu et al., 2014). Many surveys conducted over recent years have shown that enterprises and critical business organizations are moving from the single-cloud to the multi-cloud (Tebaa and Hajji, 2014; Sengupta and Annervaz, 2014). Moreover, using a minimum of two clouds (or more) is a way to reduce the risk of failure with regard to service availability, data loss, and compromised privacy, and using multiple clouds simultaneously can reduce the risk when using a public cloud for applications and data. The most common barriers to the adoption of the cloud are cost, security, reliability, and loss of control. However, the use of a multi-cloud environment can enable an organization to enjoy greater flexibility and control to decide which workloads will be run and where they should be run (Sulochana and Dubey 2015).
The overarching theme of this study focuses on the critical factors that influence the
3
Disaster Recovery (DR) plan, including minimizing storage costs, reducing the Recovery Time Objective (RTO), ensuring a high reliability rate and decreasing the number of replicas to less than 3 (the typical number of replicas) in a multi-cloud environment.
1.2 PROBLEM STATEMENT
In today’s business environment, the Information Technology (IT) data services operated by CPs face many challenges in ensuring the reliability of data services before and after disasters (Saquib et al., 2013). Data services must ensure reliability and flexibility through an effective and practical DR plan, which are vital initiatives for any organization to prosper and sustain growth (Saquib et al., 2013).
The main concern with DR in the cloud is how to ensure an effective data backup and recovery process that achieves high data reliability before a disaster while maintaining a reasonable cost (Saquib et al., 2013). Several solutions for data backup have been designed for a single-cloud architecture (Saquib et al., 2013; Suguna and Suhasini, 2014; Lenk, 2015; Jena and Mohanty, 2016). Accordingly, the idea of having only one copy of the data in a single-cloud environment may not be a good solution because any damage to the data in the case of disaster will result in a permanent loss (Tebaa and Hajji, 2014, Gu et al., 2014; Sengupta and Annervaz, 2014). Other solutions for developing a data backup and recovery plan involve multi- cloud providers in which multiple data replicas are generated for several remote CPs (Gu et al., 2014; Sengupta and Annervaz, 2014; Sulochana and Dubey, 2015; Toosi and Buyya, 2017). This approach guarantees high data reliability and minimizes the risk of data loss in case of disaster, thereby ensuring that user data are recoverable in the event of catastrophic failure.
4
According to Vukolic (2010), the main purpose of moving to a multi-cloud environment is to improve what can be offered by single-cloud by distributing data reliability among multiple CPs. The single-cloud is expected to become less popular with customers due to the risks of data service availability failure and the possibility of malicious insiders. DCs in different locations owned by one CP primarily use similar operational environments and infrastructures, which may affect the recovery of data services. For instance, if we entrust our data DR solution to a single-cloud provider that does not have a backup solution or that hosts the data in a single platform or in the same geographic area, the risk of downtime for customers, who might be unable to access their data for several hours, could increase.
Most proposed solutions assume that the data should be replicated into at least three copies (3 replicas) to ensure high reliability (Li et al., 2012; Gu et al., 2014; Li et al., 2014; Li et al., 2016; Du et al., 2017). These copies might be in one location or distributed over multiple remote locations. Nevertheless, these solutions incur high storage costs and consume a significant amount of storage space, which leads to high network traffic, particularly for data-intensive applications in the cloud (Li et al., 2012; Gu et al., 2014; Li et al., 2014; Li et al., 2016; Du et al., 2017).
Moreover, most of the previous approaches do not consider the required level of data reliability denoting if the data to be stored is critical or non-critical. Besides, the storage duration has not been taken into account whether the user wish to store the data for short-term or long-term when replicating the data across distributed CPs.
Thus, an efficient data backup and recovery strategy for DR in a multi-cloud environment taking into account the level of importance and the duration of storage must be explored. The solution should take into consideration the critical factors that
5
influence the DR plan, including minimizing storage costs, reducing RTO, ensuring high data reliability rates and decreasing the number of replicas (to less than 3).
1.3 RESEARCH QUESTIONS
In the following, we outline the research questions addressed in this research work:
1. What are the current methods available for data backup and recovery and for the maintenance of these services during disasters?
2. What are the limitations of the current methods used for data backup and recovery operations, and how does an effective and practical DR plan ensure the availability, reliability and flexibility of services?
3. Is it Possible to apply the current DR techniques designed for single-cloud environment to be used for DR in a multi-cloud environment?
4. How does the data backup and recovery process perform during disasters in a multi-cloud context?
5. How is the availability of services maintained and the continuity of these services ensured during disasters?
1.4 RESEARCH OBJECTIVES
The objectives of this thesis are as follows:
1. To design an approach for data replication management in a multi-cloud environment that determines the number of replicas (which should be less than 3) and reduces cloud storage consumption while meeting data reliability requirements.
6
2. To propose an approach for data backup and recovery for multi-cloud architecture with the aim of minimizing backup storage costs and RTOs and ensuring high data reliability.
3. To propose scheduling strategies that offer different data backup and recovery solutions based on user given criteria such as Cost, RTO and Cost/RTO.
4. To design and develop a framework for data recovery in a multi-cloud environment that provides solutions based on user preferences during disasters.
1.5 RESEARCH SCOPE
The scope of this research work is outlined in the following points:
• This research focuses on designing and developing a framework for data recovery in a multi-cloud environment to provide numerous solutions based on user preferences before and after disasters. Moreover, we examine the critical factors that influence a DR plan, including minimizing storage costs, reducing RTO, ensuring high reliability rates and decreasing the number of replicas to less than three (the typical number of replicas).
• Furthermore, we focus on issues related to data reliability services in a multi-cloud environment, including a new approach for cost-effective data reliability with minimum replicas and effective data recovery solutions before and after disasters.
• Because the data backup and recovery process require a significant amount of time, data often can be lost during disasters. Therefore, this
7
research considers the following two performance metrics to evaluate the performance of the proposed approach: the cost of backup storage and the RTO. These two metrics have been used most frequently in the literature (Sengupta and Annervaz, 2012; Saquib et al., 2013; Gu et al., 2014;
Khoshkholghi et al., 2014; Sengupta and Annervaz, 2014; Suguna and Suhasini, 2014; Alhazmi, 2016).
• The three primary DR levels are the data level, system level, and application level. The concern at the system level is data backup and recovery in the shortest recovery time, whereas the focus at the application level is on maintaining data reliability before and after disasters. Thus, this research mainly emphasizes the system and application levels (Prakash et al., 2012; Khoshkholghi et al., 2014).
1.6 RESEARCH SIGNIFICANCE
The aim of this research is to design and develop a framework for data recovery in a multi-cloud environment that provides numerous solutions before and after disasters based on user preferences. Hence, there is significant demand for a multi-cloud infrastructure that guarantees data reliability and ensures these services during disasters (Li et al., 2012; Gu et al., 2014; Li et al., 2014; Li et al., 2016; Du et al., 2017). This research also aims to propose a cost-effective approach that determines the number of replicas (which should be less than 3), thereby reducing cloud storage consumption while meeting data reliability requirements. In addition, it proposes various scheduling strategies that offer different data backup and recovery solutions based on user criteria such as Cost, RTO and Cost/RTO. These two factors cost and RTO are the most critical factors that influence the user when making a decision to
8
choose the best plan for data backup and recovery (Sengupta and Annervaz, 2012;
Saquib et al., 2013; Gu et al., 2014; Khoshkholghi et al., 2014; Sengupta and Annervaz, 2014; Suguna and Suhasini, 2014; Alhazmi, 2016).
1.7 ORGANIZATION OF THE THESIS This thesis is organized as follows:
Chapter 1 is an introductory chapter that discusses the problem statement, the research questions, the objectives of the research, the scope of the research and the research significance.
Chapter 2 is a background chapter that explains the fundamental concepts in DR. The chapter also introduces the main concepts of the preferred data DR techniques in cloud computing. In this chapter, various backup replica scheduling strategies that offer different data backup and recovery processes in a multi-cloud architecture are examined and extensively discussed. Also, it presents the fundamental concepts in DR and cloud computing. It also reviews relevant works by previous researchers on DR in cloud computing, including single-cloud and multi-cloud environment.
Chapter 3 depicts the research methodology of the thesis and describes how this research was conducted. The chapter also discusses the different phases in this research and the methodology followed during each phase. The measurement metrics and the datasets used in the experiments are presented.
Chapter 4 presents a detailed description of the proposed approach for data replication management strategy in a multi-cloud environment. This chapter also