1. Introduction
Cloud computing environments that provide computing services, such as servers and
storage over the internet, offer flexibility and efficiency to numerous businesses
and organizations. However, they also risk potential disasters such as data loss,
security breaches, and service interruptions. In these situations, data backup for
disaster recovery plays a critical role.
Beyond recovering from system errors or hardware failures, data backups provide organizations
with several benefits. First, a backup can meet legal obligations to comply with specific
laws or regulations regarding data protection and backup in certain industries. Second,
backups can play a pivotal role in the face of security threats. The ability to rapidly
recover from data security threats, such as ransomware or hacking attacks, ensures
that user data are protected, enhancing trust and reliability [1,2]. Quick data recovery meets the Recovery Time Objective and minimizes disruptions
to crucial business processes. Third, unforeseen natural disasters such as earthquakes,
floods, and fires can strike, in which situations a backup strategy can safeguard
the data [3,4].
In conclusion, data backup for disaster recovery in a cloud-computing environment
is not merely a mechanism for protecting data. It serves critical functions in various
aspects, such as business strategy, legal compliance, security, innovation, and reliability
enhancement. Considering these multifaceted aspects, data backup can be deemed an
essential component for the successful operation of contemporary organizations, and
an efficient backup strategy can support their survival and growth [5]. The environment in which a disaster recovery backup strategy is executed is of utmost
importance. Currently, most cloud environments use container-based infrastructures
that requires container orchestration tools for effective management. Kubernetes [6], among container orchestration tools, is the most widely used owing to its scalability,
automation, and compatibility with various cloud providers. Therefore, Kubernetes
plays a crucial role in disaster recovery backup strategies. By employing this tool,
a complex infrastructure can be easily managed, offering the flexibility to rapidly
recover from failures. Kubernetes can automatically deploy and manage ``containerized''
applications. ``Containerization'' refers to packaging an application and its runtime
environment into an isolated unit called a ``container,'' thereby simplifying processes
like development, testing, and deployment while also allowing for efficient resource
usage.
However, a containerized environment has a complex structure, in which various containers
interact. Kubernetes backups must address this complexity and provide stability. Effective
backups and restorations of applications, configurations, and data are crucial, and
there should be capabilities to recover quickly in case of failure. Moreover, backups
via Kubernetes can be consistently carried out not only in cloud environments but
also in on-premises, where IT infrastructure is managed in a company’s own data center,
local server, or hybrid environments. This simplifies traditional complex backup methods
while ensuring cloud flexibility and scalability. Typical tools for disaster recovery
in a Kubernetes environment include the ``Kubernetes Backup and Restore Tool,'' which
primarily focuses on recovering resources and settings in a single Kubernetes cluster
environment, and the ``Recovery Function in a Multicluster Management Platform,''
which centers on recovering resources and settings in environments managing multiple
Kubernetes clusters.
Backup and restoration tools for Kubernetes have become increasingly important owing
to the growing complexity of cloud-native infrastructure and applications. As complexity
increases, so does the risk of failure; hence, such tools can safely preserve the
data, configurations, and service states of Kubernetes clusters. With this conserved
information, the system can be swiftly and reliably restored upon encountering issues.
The key features and functionalities of such systems are as follows:
·Automated Cluster Recovery Support: Not every tool offers automated cluster recovery.
Manual intervention or additional automation is necessary in environments requiring
this feature.
·Supports Data-Transfer and Restoration between Clusters for business continuity,
ensuring high availability and data consistency.
·Various features such as real-time data snapshots, data encryption, and setting up
automatic recovery procedures are offered to support data management and security
in cloud environments.
Kubernetes backup and restore tools offer flexibility, addressing diverse and complex
requirements. These designed features ensure enhanced data stability and service reliability
for the users.
Furthermore, the multicluster management platform offers a solution that allows the
integrated management of diverse Kubernetes clusters and services, especially as centralized
management becomes challenging owing to the increasing complexity of modern cloud
services and applications. The primary features and functionalities of this platform
include the following:
·Offering consistent management of resource allocation, access rights, and policy
settings across multiple clusters to enhance efficiency and flexibility.
·Automating repetitive tasks, such as scaling infrastructure, updating, and monitoring,
to reduce operational complexity.
·Automating data synchronization and backups between various clusters and applications
to ensure data security and consistency.
The multicluster management platform, in addition to providing integrated management
of various clusters, reduces the administrative burden on organizations’ IT infrastructure
through several automated features. This ultimately leads to a more effective achievement
of business objectives.
While existing studies primarily focus on comparing and evaluating Kubernetes backup
and restore tools, our research uniquely analyzes and contrasts the Kubernetes backup
and restore tools and the recovery features of multicluster management platforms.
Therefore, we provided the specific criteria necessary for tool selection. Our research
stands out for addressing both cluster-level and application-level recovery. This
is an investigation to validate the feasibility and necessity of developing an automated
recovery system by integrating the recovery capabilities of Kubernetes recovery tools
and multi-cluster management platforms. Future research will explore the possibility
of a system that automatically restores Kubernetes clusters and their internal applications
in a multicluster environment by integrating these two types of recovery tools and
introducing recovery automation functions.
This paper is structured as follows: Section 2 reviews relevant studies to present
the significance of disaster recovery in the Kubernetes environment. Section 3 introduces
the functionalities and characteristics of various backup and restoration tools and
compares them. Section 4 investigates and compares the recovery features and attributes
of the multicluster management platforms. Section 5 discusses directions for future
research, focusing on the one that automatically restores both managed clusters and
applications. Integrating automatic recovery can overcome the existing system limitations
and significantly enhance the efficiency and reliability of data backup and restoration.
Finally, Section 6 concludes the study by discussing its expected impact on this field
of research.
2. Related Work
Sammer [7] conducted a thorough analysis and comparison of the mechanisms and tools necessary
for the backup and restoration of systems based on the Kubernetes. Their study particularly
delves deep into the practicality and efficiency of two primary backup and restoration
tools, namely Velero [8] and Kasten K10 [9]. Both tools were evaluated on various fronts, including different backup strategies,
volume snapshot functionalities, cloud compatibility, and user-permission settings
(RBAC).
Tamimi [10] provided a detailed account of the selection, implementation, and application of
backup and restoration tools in real-world industrial settings. Their study describes
various strategies and configurations that enable continuous data protection and swift
restoration. Furthermore, the study extensively considers several technical factors
when choosing backup and restoration tools, including data encryption, access control,
compatibility with various cloud service providers, and locations for storing backup
data. This comprehensive analysis proposes criteria for selecting most efficient and
secure backup and restoration solution. Their paper also examined how the lifecycle
of Kubernetes and the backup and restoration process interact, assisting DevOps teams
in selecting and implementing the appropriate backup and restoration strategy at various
system stages. This is particularly valuable when developing and maintaining deployment
models for cloud-native applications and APIs.
Yu [11] focused on application recovery automation in cloud-computing-based aerospace ground
systems. He developed and presented a recovery service for software-failure scenarios.
The recovery service mainly emphasized application recovery automation, aiming to
bolster stability and availability in a cloud-computing environment. This paper delves
into the technical specifics related to various software recovery strategies and presents
experimental results evaluating software recovery time and capability. An automatic
recovery feature was proposed and designed for applications in response to software
failures in a cloud-computing environment, offering recovery strategies for diverse
application scenarios. The study verified the efficacy and performance of these recovery
strategies through actual experiments and introduced an automated method to maintain
the persistence of access portals after application recovery, ensuring the continuity
of upstream business without interruptions. Users can continue using the service without
considering the recovery process. This research differentiates itself by its focus
on application recovery automation in a cloud environment, whereas most previous studies
[12-14] were primarily concentrated on data and system recovery. Thus, it offers measures
for enhancing the stability and availability of applications in cloud environments.
Each of these studies provided an in-depth analysis of backup and restoration strategies
in Kubernetes and cloud-computing environments from various perspectives. Sammer [7] contrasted the practicality and efficiency of notable Kubernetes backup and restoration
tools, such as Velero and Kasten K10, evaluating multiple backup strategies and features.
Tamimi [10] offered comprehensive guidelines for selecting and implementing backup and restoration
tools and analyzed their feasibility in real-world business settings. Yu [11] focused on application recovery in cloud-based aerospace ground systems and suggested
recovery strategies and automated methods at the application level. All the studies
collectively explored diverse methods to enhance stability and availability in their
respective domains. A holistic analysis of these studies assists in determining optimal
backup and restoration strategies tailored to various requirements and settings.
3. Kubernetes Backup and Restoration Tools
Kubernetes backup and restoration tools offer functionalities essential for safely
managing the data and system state of containerized applications [15]. Data are one of the core assets of today’s enterprises and organizations, and their
protection and management carry an immense responsibility. Improperly managed data
can lead to failures and can significantly impact business continuity. The use of
these tools plays a pivotal role in securely backing-up critical data and settings
and ensuring quick restoration in the event of failures or data loss.
Kubernetes, owing to its intricate and dynamic structure, often finds traditional
backup methods insufficient. Particularly in distributed environments, such as microservice
architecture, maintaining data consistency becomes even more imperative and complex.
In such scenarios, Kubernetes backup and restoration tools continuously monitor the
resources and states within the cluster, performing regular or ad-hoc backups to enable
swift recovery upon failure. Furthermore, these tools offer advanced functionalities,
such as data management in multicluster or multicloud environments, backup scheduling,
and policy-based management, allowing for flexible responses in complex operational
scenarios.
With the mainstream adoption of cloud computing and containerization technologies,
the significance of data backup and restoration tools has increased. Kubernetes backup
and restoration tools have cemented their place as crucial components of modern IT
infrastructure, supported data availability, integrity, and security while concurrently
catered to business requirements and regulatory compliance.
In the subsequent subsections, we investigated into a detailed description and comparative
analysis of notable Kubernetes backup and restoration tools such as Velero, Kasten
K10, Portworx Backup, and KubeDR. We explored the distinct features and functionalities
of each tool. Additionally, by comparing the features of each tool, an analysis is
provided regarding which tool may be more appropriate for users. This will assist
users in selecting the most suitable backup and restoration tool for their Kubernetes
environment.
3.1 Velero
Velero, also known by its previous name Heptio Ark, facilitates the backup and restoration
of Kubernetes cluster resources and persistent volumes [16]. Velero can be operated in both cloud service providers and on-premises environments.
It can perform restorations at both the cluster and application levels. This allows
the recovery of an entire cluster or, if needed, recovery at specific applications
or namespace levels. Velero comprises a server running inside the cluster and a command-line
client running locally. These include controllers that manage the Kubernetes custom
resources for backup, restoration, and related operations. Therefore, Velero is equipped
with the ability to backup or restore an entire cluster or objects filtered by type,
namespace, or label. It is considered an ideal tool not only for disaster recovery
use cases, but also for snapshotting application states before performing system operations
on a cluster.
3.1.1 Backup in Velero
Velero manages and creates backups for the data and resources of a Kubernetes cluster.
The types of backup include the following:
·On-demand backup: Users can manually initiate backup as required. This operation
preserves the state of the Kubernetes cluster, allowing the restoration of the entire
or parts of the cluster when needed.
·Scheduled backup: Velero conducts automatic backups such as cronjobs. This ensures
that users maintain consistent snapshots of the Kubernetes cluster, which is highly
beneficial for regular inspections and disaster recovery scenarios.
The backup flow begins by executing the ``velero backup create'' command, which interacts
with the Kubernetes API server. The BackupController acknowledges the newly created
backup object and conducts a validation. The data are then fetched by querying the
API server, and the backup files are subsequently uploaded to the object storage service.
By default, Velero creates disk snapshots for all the persistent volumes. This snapshot
creation can be adjusted with specific flags and deactivated using the ``--snapshot-volumes=false''
option.
Fig. 1 illustrates the backup flow of Velero. In Step 1, the Velero user creates a backup
custom resource (CR) by calling the Kubernetes API server. In Step 2, the backup controller
is set to monitor the newly created backup CR from the Kubernetes API server. In Step
3, the backup controller queries the Kubernetes API server to collect data for backup.
The backup controller invokes the cloud storage service to upload the backup file
in Step 4, and in Step 5, it generates the disk snapshot of the Persistent Volume
(PV).
Fig. 1. Backup flow of Velero.
3.1.2 Restoration in Velero
Velero restores the data and resources from previous backups. It can restore all objects
and PVs or use filters to specify a subset of objects and volumes that one wants to
restore. This accommodates various scenarios that can arise in Kubernetes clusters.
By default, Velero conducts nondestructive restorations, meaning that it does not
delete data in the target cluster. If a resource from the backup already exists in
the target cluster, it is skipped. However, users can utilize the ``--existing-resource-policy''
restore flag to configure Velero to update existing resources to match those in the
backup.
The restoration flow starts by executing the ``velero restore create'' command, in
which the client sends a request to the Kubernetes API server to produce a restore
object. Subsequently, the RestoreController recognizes the new restore object and
validates it. After retrieving the backup details from remote storage, it preprocesses
the backup resources. This step is crucial for ensuring that the restored resources
function properly in the cluster. Subsequently, the resources are individually restored.
Fig. 2 illustrates the restoration flow of Velero. In Step 1, the Velero user initiates
a restore CR by calling the Kubernetes API server. In Step 2, the restore controller
detects the newly created restore CR and monitors it along with the Kubernetes API
server. In Step 3, the restore controller fetches backup details from the cloud storage
service and then preprocesses the backed-up resources to verify that they operate
within the cluster. The restore controller sends a request to the Kubernetes API server
to initiate the restoration process in Step 4.
Fig. 2. Restoration flow of Velero.
3.2 Kasten K10
Kasten K10 is a data management and protection platform optimized for modern cloud-native
environments and Kubernetes. It offers automated workflows for various functions,
such as migration, disaster recovery, backup, and data protection, to manage an application’s
lifecycle. These features are easily configurable through an intuitive user interface,
and provide flexibility through integration with various cloud environments and storage
backends. Such multifunctionality reduces the complexity of data managementwhen implementing
a secure protection mechanism.
3.2.1 Backup in Kasten K10
The features and characteristics provided by Kasten K10 include the following:
·Policy-based Backup: Kasten K10 supports policy-based backup management. Users can
define backup policies that include parameters such as the backup frequency, retention
period, and target storage location. Once set, the system automatically performs backup
operations based on these policies.
·Application-centric Backup: Rather than just backing-up data, Kasten K10 also backs
up related Kubernetes configurations, storing information, and all metadata associated
with the application. This ensured a more accurate restoration of the entire application
state.
·Support for Various Storage Backends: Kasten K10 supports backups for various storage
solutions and cloud environments. Users can choose the storage backend best suited
to their environment to store the backup data.
·Data Efficiency: To store backup data efficiently, Kasten K10 applied deduplication,
compression, and efficient data-transfer methods. This promotes storage-space optimization
and faster backup operations.
·Reliability and Robustness: The backup process of Kasten K10 is designed to be robust,
supporting features such as retrying interrupted backups and resuming data transfers
during network instability. This ensures the reliability of the backup operations,
which is particularly crucial when backing-up large datasets.
·Encryption: Kasten K10 applies encryption to backup data for security, preventing
unauthorized access to or leakage of data containing sensitive information.
·Scalability: Kasten K10’s backup architecture is designed considering the scalability
of cloud-native environments, ensuring fast and efficient backup operations even in
large and complex cluster settings.
3.2.2 Restoration in Kasten K10
The main features and details of Kasten K10’s restoration are as follows:
·Instant Recovery: With Kasten K10’s instant recovery feature, users can quickly recover
the restoration point. Instant recovery can be performed much faster than regular
restoration, significantly contributing to business continuity.
·Resource Transformation: Kasten K10 supports the transformation of Kubernetes resources
during the restoration process. For instance, when restoring a recovery point created
from one cloud provider to a cluster from another cloud provider, resources such as
container image URLs or storage-class configurations can be modified.
·Cloning Applications: Kasten K10 provides a feature to restore applications to a
target namespace that is different from the namespace of the original application.
This is useful for extracting specific files or parts of the original data or for
cloning an application for debugging or testing/development purposes.
·Persistent Volume Claim (PVC) Name Alteration: Kasten K10 offers the option of changing
the name of the PVCs during restoration. This allows the renaming of PVCs depending
on workload configurations, and this feature is also used in the StatefulSet and DeploymentConfigs
settings.
·Use of Alternative Location Profiles: Users can select an exported restoration point
to restore applications from locations outside a cluster. This is difficult to achieve
when the restoration point is copied or moved to a different location.
·Application-centric Restoration: Kasten K10 restores application data along with
all metadata of the application captured during a backup. This ensures accurate restoration
of the application state and prevents any interruption to the continuity of the application.
·Restoration Flexibility: Kasten K10 provides various options and configurations for
restoration operations. This flexibility supports a range of restoration scenarios
and requirements, allowing the selection of the most suitable restoration strategy
in specific situations.
3.3 Portworx Backup
Portworx Backup [17] is a backup and restoration solution tailored for the Kubernetes environment. This
tool restores the applications and data across multiple clusters. Notably, with the
namespace and label selector features, users can produce granular backups, even allowing
backups of only specific resources of interest. This ensures consistency in the associated
configurations and pod data during backup. Integrated with Portworx Central, it facilitates
the management of multiple clusters and their backups from a single UI, aiding users
in easily managing backups for resources they have permission to access, even in multi-user
environments.
3.3.1 Backup in Portworx Backup
Portworx’s Backup capabilities offer the following diverse features and options:
·Selective Backup: Portworx Backup offers users detailed backup options. Using namespaces
and label selectors, users can precisely specify the resources to be backed up. For
example, one can conduct a backup operation targeting only MySQL pods, PVCs, and volumes
with the ``app=mysql'' label. Such fine-grained selections backup only the necessary
resources, conserving storage-space and shortening restoration times.
·Scheduling and Automation: Portworx Backup supports the automation of backup tasks
through scheduling policies. Users can set up backups to occur at specific times or
intervals, allowing off-hour backups or running backup tasks during low-traffic times,
such as weekends, to minimize system impact.
·Variety of Storage Options: When choosing a storage location for backup data, Portworx
Backup offers a range of choices. It supports major object storage such as AWS S3,
Azure Blob Storage, and Google Cloud Storage, as well as options such as Portworx
PX-Store. This flexibility allows users to select optimal storage, considering factors
such as availability, cost, and region.
·Integrated Resource Backup: Beyond data, Portworx Backup also encompasses various
Kubernetes resource types within its backup purview. It provides comprehensive backup
from core resources such as PV, Deployment, and Service to configuration and authentication
details such as ConfigMap and Secret.
·Backup Rule System: A rule system is provided that automatically executes certain
tasks or commands before and after a backup. For instance, automating tasks, such
as pausing specific services before a backup, or sending notifications after a backup
can enhance the accuracy and efficiency of the backup process.
3.3.2 Restoration in Portworx Backup
Portworx Backup’s restoration capabilities offer the following diverse features and
options:
·Original Restoration: Users can restore the backup data to their original cluster.
This is particularly useful when there is data loss or issues with specific resources
within a cluster, and there is a need to restore that resource to its original state.
·Restoration to Another Cluster: If required, backup data can be restored to different
Kubernetes clusters. This can be leveraged for purposes such as testing in different
environments or data migration.
·Restoration to a New Namespace: It is possible to restore data within the same cluster
but at a different namespace. This can be employed for rapid service recovery in situations
in which a specific namespace encounters issues.
·Namespace-based Restoration: Users can choose to restore only data corresponding
to a specific namespace from an entire backup dataset. This is useful when there is
a desire to individually restore data for a specific application.
·Restoration through Label Selectors: Data can be filtered and restored based on the
labels. Through this feature, associated groups of resources can be effectively restored.
·Considering Resource Dependencies: The restoration process is performed by considering
the dependencies of the backed-up resources. For instance, resources such as PVC,
Deployment, ConfigMap, and Service related to a PV are restored sequentially based
on their dependencies, ensuring their proper function after restoration.
·Direct Restoration Method: Portworx Backup restores data directly from the original
storage location where it is backed up, without moving it elsewhere. This enhances
the restoration speed and reduces the costs associated with data movement.
·Command Execution: Users can set up rules to execute specific commands before and
after the restoration process. This allows the automation of additional tasks, such
as verifying existing data before restoration or sending notification messages after
restoration.
3.4 KubeDR
KubeDR [18] is a tool designed to protect crucial data within Kubernetes clusters. Primarily,
it backs up the Kubernetes objects stored in the ETCD and optionally certificates
them to an S3 bucket. KubeDR follows the widely adopted ``Operator Pattern'' in Kubernetes,
which consists of a combination of CR and their corresponding controllers. Each CR
has its own unique controller that performs data backup and recovery operations. The
KubeDR operator uses webhooks to verify data validity and, if necessary, sets default
values for resource specifications. By doing so, KubeDR enhances the data integrity
and disaster recovery capabilities of Kubernetes clusters.
The key features of KubeDR include the following:
·Backup of ETCD Data and Certificates to S3: KubeDR backs up ETCD data and cluster
certificates to an S3 bucket. This backup securely protects vital cluster information.
·Backup Encryption and Deduplication: The backup is encrypted for security purposes,
and redundant data are eliminated to conserve the storage space.
·Backup Pause and Resume Functionality: KubeDR offers the capability to pause and
resume backup operations, thereby enabling backup adjustments during specific periods.
·Control of Retained Backups via Retention Settings: Users can determine the backup
retention duration and decide how long to retain backups.
Because KubeDR requires direct access to the ETCD, it operates only in clusters with
accessible ETCD and the capability to take snapshots. This includes on-premise clusters
and cloud environments configured explicitly for computing instances.
3.4.1 Backup in KubeDR
Kubernetes stores all cluster data in a distributed store called ETCD, making it imperative
to periodically back up the ETCD data for disaster recovery (DR) and data loss prevention.
KubeDR performs backups based on the following features:
·Backup Target Specification: The backup target was set as an S3 bucket. Users define
the backup target by creating a Kubernetes Custom Resource called Backup Location.
Therefore, KubeDR backs up data to the designated S3 bucket.
·Credential Management: Certificates and credentials used to access S3 for backup
are stored in Kubernetes ``secret'' resources. This secret includes the authentication
keys required for the S3 access and passwords for encryption.
·Data Encryption: To ensure data protection, KubeDR encrypts the backup data. The
encrypted backup data, stored in the S3 bucket, offers enhanced security against data
breaches.
·Deduplication: Redundant backup data are eliminated to save storage space, prevent
the backup of identical data blocks multiple times, and ensure efficient storage use.
·Pause and Resume Functionality: Users can pause or resume backup operations as required,
allowing control over backup operations at specific times or in exceptional circumstances.
·Retention Policy Setup: Users can set a backup retention period, allowing backups
to be stored for specified durations. This feature facilitates the automatic deletion
or retention of old backup data.
3.4.2 Restoration in KubeDR
KubeDR supports two types of restoration: DR restoration and Regular Restore.
DR restoration is used to configure a new cluster after a master node has been lost.
To perform DR, the following steps were performed:
·Backup Browsing: Users review the backup snapshots in the target S3 bucket and select
the snapshot ID to restore from.
·Restoration: Using the kubedctl command or Docker command, the data were restored
using the chosen snapshot ID. The restored data consist of ETCD snapshot files and
optionally certified files. With the restored data, a new cluster was configured,
and the data were restored.
Regular Restoration is employed when the cluster is operational; however, access is
required to certificate or ETCD snapshots. To perform a regular restoration, the following
steps are taken:
·Backup Browsing: After each successful backup, KubeDR creates a resource named MetadataBackupRecord.
Through this resource, all backup snapshots are listed chronologically, allowing for
the selection of a backup to be restored from.
·Restoration Settings: Users create a Persistent VolumeClaim (PVC) to define the source
and target for the restoration. PVC is the Kubernetes resource used to store the data
to be restored.
·Initiate Restoration: By creating a MetadataRestore resource, the restoration operation
is triggered. This allows KubeDR to restore the data from the backup snapshot and
save it to the PVC.
3.5 Comparative Analysis of Backup and Restoration Tools in Kubernetes
We conducted a comparative analysis of the backup and restoration tools by considering
ten criteria. The criteria and associated analyses are as follows.
·Application and Cluster Recovery Support: Velero uniquely supports both application
and cluster-level recovery. This support ensures the application integrity and cluster
stability in complex environments. In contrast, Kasten K10 and Portworx Backup only
support application recovery, which may not be suitable in situations where cluster
recovery is required. KubeDR exclusively supports cluster recovery, making it less
likely when application recovery is crucial.
·Cloud Connectivity: Most backup and restoration tools can connect with various cloud
platforms, such as AWS, GCP, and Azure. This connectivity boosts cloud-to-cloud mobility
and provides flexibility in diverse environments.
·Backup Location Support: The backup location refers to the physical space in which
data are stored and is central to the backup strategy. Velero, Kasten K10, and Portworx
Backup can backup to both S3 and block stores, with Kasten K10 additionally supporting
file store backups. However, KubeDR was limited to backing up only to S3. Support
for backup locations plays a crucial role in determining data accessibility and integrity,
particularly when formulating DR strategies.
·Role-Based Access Control (RBAC) Support: RBAC is a crucial feature for enhancing
security, which is only supported by Kasten K10 and Portworx Backup. The lack of this
security feature in Velero and KubeDR can increase the risk associated with data access.
·Encryption Support: Most tools support authentication and data encryption, thereby
reinforcing data security. This is particularly significant in environments focusing
on personal data protection and regulatory compliance.
·User Interface: Kasten K10 and Portworx Backup offer user-friendly experiences with
their web UIs. Velero and KubeDR only support CLI, making them potentially unsuitable
for those seeking a more intuitive interface.
·Cost: Velero and KubeDR are available for free and offer cost-effective solutions.
Kasten K10 and Portworx Backup require paid licenses, necessitating budget considerations.
·Automatic Cluster Recovery Support: No tool supports automatic cluster recovery,
such that manual intervention or additional automation is required for environments
requiring this feature.
·Automatic Application Recovery Support: Kasten K10 supports automatic application
recovery, although its setup can be complex, and its functionalities are limited.
This might introduce challenges during the initial setup and may not meet specific
requirements.
Table 1 summarizes the descriptive comparative analysis of Kubernetes backup and restoration
tools. This indicates that the tool one should opt for requires a holistic consideration
of specific user requirements, budget, technical proficiency, security needs, etc.
In particular, the needs for cluster and application recovery, as well as budget considerations,
are paramount.
In conclusion, selecting and adequately implementing the most suitable tool according
to the system complexity and requirements maximizes data protection and operational
efficiency. Carefully weighing the various factors ensures that the chosen solution
aligns with the user’s goals.
Table 1. Comparative Analysis of Backup and Restoration tools in Kubernetes.
|
Velero
|
Kasten K10
|
Portworx Backup
|
KubeDR
|
Application-level Recovery Support
|
Supports Application-level Recovery
|
Supports Application-level Recovery
|
Supports Application-level Recovery
|
Does not Supports Application-level Recovery
|
Cluster-level Recovery Support
|
Supports Cluster-level Recovery
|
Does not Supports Cluster-level Recovery
|
Does not Supports Cluster-level Recovery
|
Supports Cluster-level Recovery
|
Supported Cloud Platforms
|
AWS, GCP, Azure, etc.
|
AWS, GCP, Azure, etc
|
AWS, GCP, Azure, etc
|
AWS, GCP, Azure, etc/
|
Backup Technology Support
|
Supports backup to S3 and block stores
|
Supports backup to S3 and block stores and file stores
|
Supports backup to S3 and block stores
|
Supports backup to S3 only
|
RBAC Support
|
Does not support multi-user RBAC
|
Supports RBAC
|
Supports RBAC
|
Does not support RBAC
|
Encryptrion
|
Supports authentication and authorization and optional data encryption
|
Supports authentication and authorization and data encryption
|
Supports authentication and authorization and data encryption
|
Supports authentication and authorization
|
User Interface
|
Supports CLI only
|
Supports both Web UI and CLI
|
Supports both Web UI and CLI
|
Supports CLI only
|
Cost
|
Open Source, Free
|
Requires License, Paid
|
Requires License, Paid
|
Open Source, Free
|
Automatic Cluster-level Recovery Support
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Automatic Application-level Recovery Support
|
Does not support Automatic Application-level Recovery
|
Supports for automation through user configurations
|
Does not support Automatic Application-level Recovery
|
Does not support Automatic Application-level Recovery
|
4. Multicluster Management Platform
A multicluster management platform provides an environment in which various clusters
and services are integrated and managed. As the complexity of cloud-based services
and applications increases, consistent management and deployment across multiple clusters
become essential.
The multicluster management platform enhances efficiency and flexibility in such intricate
settings, reducing operational burdens through resource optimization and automation.
As the number of clusters to be managed increases, the error probability increases,
and maintenance and monitoring become challenging without a centralized management
platform.
The adoption of a multicluster management platform simplifies the arduous task of
manually managing individual clusters, supporting consistent policy application, monitoring,
and stable deployments. Consequently, businesses and organizations leverage these
platforms to boost the efficiency of their IT infrastructure and reinforce the competencies
that are crucial for achieving business objectives.
The following subsections provide a detailed description and comparative analysis
of leading multicluster management platforms, such as Rancher, Kubesphere, Razee,
and OpenShift. We examined the unique features and functionalities of each tool. Moreover,
by comparing the functionalities of each tool, we provide an analysis to determine
the most suitable ones for users. Thereby, users can gain insight into selecting the
most suitable multicluster management platform for their Kubernetes environment.
4.1 Rancher
Rancher [19] is a container management platform that operates Kubernetes clusters in on-premise,
cloud, or edge environments. Ranchers are ideal for multicluster, hybrid, or multicloud
container scenarios. It centralizes authentication and RBAC for all clusters, enabling
global administrators to control cluster access from a single location. Rancher offers
the capability of importing and managing clusters through a single interface, implementing
consistent security policies, monitoring logs, and overseeing all performances.
4.1.1 Architecture of Rancher
Fig. 3 illustrates the structure of a Rancher server installation managing one Kubernetes
cluster configured with the Rancher Kubernetes Engine and another Kubernetes cluster
configured with the Amazon Elastic Kubernetes Service. The description of the functions
of each Rancher server component is as follows:
Fig. 3. Architecture of Rancher.
·Authentication Proxy (Auth Proxy): The Auth Proxy is integrated with Rancher’s authentication
services to manage user authentication. Before passing the Kubernetes API calls to
the underlying clusters, it authenticates the caller and sets the appropriate Kubernetes
impersonation headers to securely relay the request. Users can access the resources
in the underlying clusters through an Authentication Proxy using either the Rancher
UI or Kubectl commands.
·Rancher API Server: This is the central component of Rancher, responsible for managing
Rancher resources, such as clusters, projects, users, and apps. Through interfaces
such as the Rancher UI, users can perform cluster-related operations using the Rancher
API.
·Cluster Controller: The Cluster Controller monitors state changes for each underlying
cluster and manages cluster resources to transition them to the desired state. It
also configures access control policies for clusters, projects, and provisional clusters.
Each underlying cluster has a Cluster Controller that reports the state of the cluster
back to the Rancher server.
·Cluster Agent: Operating within the underlying cluster, this Rancher component communicates
with Rancher and manages resources within the cluster. The Cluster Agent connects
to the Kubernetes API of Rancher-launched Kubernetes clusters. It oversees the management
of workloads, creation, and deployment of pods, policy application, and communicates
events, metrics, node information, and the state between the cluster and Rancher server.
Each underlying cluster has one Cluster Agent that connects to the Cluster Controller.
Each of these components supports Rancher functionalities and plays an essential role
in cluster management. The Authentication Proxy is responsible for user authentication,
the Rancher API Server provides core Rancher functions, and the Cluster Controller
and Cluster Agent monitor and manage the underlying clusters.
4.1.2 Backup and Restoration of Rancher
The backup and restoration functionalities of Rancher are composed of the following
elements:
1. Backup Creation
·Periodic Backup: Rancher takes periodic snapshots based on a user-defined schedule.
These backups preserve the critical states of the Rancher Server and cluster data,
thereby facilitating DR.
·One-time Backup: Users can manually initiate backups, as needed. This is beneficial
for taking additional backups before significant changes occur or in anticipation
of unforeseen events.
2. Backup Configuration
·Snapshot Contents: The backup includes data from the Rancher Server, as well as the
state and configuration information of the Kubernetes cluster. This allows users to
complete the restoration of the Rancher system.
·Diverse Options: During backup, users can select only the data that they require.
For instance, they can choose to backup only the ETCD data, enabling the recovery
of vital cluster information.
3. Backup Storage
·Local Storage: By default, backup data are stored on the local disk of the node where
the Rancher Server runs. Local backups offer a simple configuration and facilitate
swift backups and restoration.
·S3-Compatible Storage: Instead of local storage, users can configure S3-compatible
storage to remotely store backup data. This ensures that, even if all ETCD nodes are
lost, the cluster can still be restored using remote snapshots.
4. Restoration
·Snapshot-based Restoration: Users can restore both the Rancher Server and cluster
using the created backup snapshots. This allows users to quickly revert the system
to its original state in the event of unexpected disruption or data loss.
·Detailed Restoration Options: If needed, users can opt to restore only ETCD data
or include Kubernetes versions and cluster configuration information. This is useful
for restoring specific parts of the cluster.
Although Rancher’s backup and restoration functions do not directly support application-level
backup and restoration, they reliably manage the Rancher system itself, ensuring the
safety and availability of applications. Through backup and restoration, users can
prepare for potential disaster scenarios and swiftly restore their clusters to their
original states. This enables users to utilize a secure and reliable Kubernetes environment
based on the Rancher.
4.2 KubeSphere
KubeSphere [20] is an enterprise-grade container management platform that leverages Kubernetes and
cloud-native technologies based on open-source code. This platform enables the easy
use of Kubernetes, even in complex cluster environments, and effectively supports
the deployment, operation, and management of containerized applications. It provides
all the tools and features required to integrate and manage Kubernetes clusters in
various cloud and on-premise environments, allowing for the efficient allocation and
central monitoring of resources.
4.2.1 Key Features of KubeSphere
KubeSphere offers a range of functionalities including the following:
·Multitenancy: Allows various teams or projects to share a single Kubernetes cluster
while operating independently. This facilitates optimal resource utilization, and
each team or project can work autonomously within their namespaces.
·DevOps Support: KubeSphere provides features for continuous integration (CI) and
Continuous Deployment (CD). This allows developers and operation teams to manage the
entire lifecycle of an application efficiently. Such DevOps capabilities automate
the process from development to deployment, enabling swift and stable software rollouts.
In addition, KubeSphere provides a plethora of features, such as inter-cluster networking,
service mesh, high scalability, and support for various storage and network plugins.
Through these capabilities, users can simplify intricate tasks related to Kubernetes
and effectively oversee the cluster operations.
4.2.2 Backup and Restoration in KubeSphere
KubeSphere does not offer backup or restoration functionalities directly. Instead,
it supports the integration of separate backup and restoration tools. For example,
it can be used in conjunction with Velero to enable backup and restoration of both
clusters and applications.
4.3 Razee
Razee [21] is an open-source project developed by IBM. It was designed to automate the deployment
and management of Kubernetes resources across multicluster environments. Razee addressed
these requirements, recognizing the need for Kubernetes to expand consistently across
multiple clusters, environments, and cloud providers.
Razees comprise three primary modules, each designed to cater to various needs and
situations within a cluster. These loosely connected modules are RazeeDash, RazeeDeployables,
and RazeeDeploy, which allow for independent usage based on necessity.
RazeeDash focuses on visualizing deployment information by dynamically generating
a real-time inventory of the Kubernetes resources. In turn, RazeeDeploy and RazeeDeployables
emphasize the efficient templating of Kubernetes resources and the automation of resource
deployment and management in multicluster environments.
Through this configuration, Razee offers an integrated platform for businesses and
organizations to simplify resource deployment and management across clusters while
also providing clear insights into cluster status and deployment conditions.
4.3.1 Key Features of Razee
Razees offer several features as follows:
·Cluster Inventory Management: By utilizing RazeeDash and Watchkeeper, users can add
clusters to the Razee inventory list. This feature allows for the monitoring of Kubernetes
resource deployment statuses using intelligent filters and alerts, providing real-time
visualization and management of each cluster’s current state and configuration information.
·CD Across Clusters and Environments: Using RazeeDeployables, users can control and
automate their Kubernetes resource deployment across clusters and environments. They
can leverage this by adding all clusters to the Razee inventory and subscribing clusters
to publishing channels that contain the desired Kubernetes resource versions.
·Templating Kubernetes Resources: RazeeDeploy includes custom resource definitions
(CRDs) that assist users in dynamically generating Kubernetes resources based on set
feature flags or variables. This helps group resources and automatically applies them
to clusters.
·RemoteResource & RemoteResourceS3: These CRDs and controllers are utilized to automatically
deploy Kubernetes resources stored in source repositories.
·MustacheTemplate: A CRD and controller defining environment variables that can replace
specific parts within other Kubernetes YAML files.
·FeatureFlagSetLD: A CRD and controller that automatically fetches feature flag values
from Launch Darkly. This enables users to control the code deployed to clusters and
manage multiple versions of Kubernetes resources across various clusters, environments,
or clouds.
·ManagedSet: A CRD and controller grouping Kubernetes resources intended for simultaneous
creation and application to clusters.
4.3.2 Backup and Restoration in Razee
Primarily, a tool specialized for the deployment and management of Kubernetes resources,
Razee, does not inherently offer backup and restoration functionalities. However,
by utilizing open-source tools, such as Velero, it is possible to securely backup
and restore Kubernetes resources and data. By integrating these tools with Razee,
resources deployed via Razee can be safely archived and subsequently restored. Beyond
the functionalities of Razee, integration with the Kubernetes backup and restoration
tools is essential to enhance cluster stability and data protection. Through such
integration, the security and protection of data in the clusters can be further reinforced.
4.4 OpenShift
OpenShift [22] is a comprehensive container application platform that inherently integrates the
Kubernetes container cluster management and orchestration system and multicluster
management technology, all combined on an enterprise foundation of Red Hat Enterprise
Linux.
OpenShift facilitates the rapid construction, development, and deployment of applications
on almost any infrastructure type, from public to private clouds and on premise, without
being restricted to specific application architectures. Enterprises can rapidly commercialize
their ideas owing to this flexibility. In essence, OpenShift provides an integrated
operational environment supporting Kubernetes by employing Docker containers and DevOps
tools.
4.4.1 Key Features of OpenShift
The primary features of OpenShift include the following:
·Container Orchestration: Based on Kubernetes, OpenShift automates the placement,
scheduling, and management of containers. Through container orchestration, it supports
the automated deployment and scaling of applications, maintaining their availability
and performance.
·CI/CD Pipeline: When developers commit an application code to a version control system
within OpenShift, it is automatically built, tested, and deployed, enabling continuous
integration and deployment and fostering collaboration between development and operations
teams.
·Multicloud Support: OpenShift offers a consistent experience across various environments,
including public clouds, on-premises, hybrid clouds, and edge architectures. Users
can deploy applications flexibly and efficiently across multiple clouds and on-premise
resources.
·Support for Stateful and Stateless Applications: OpenShift supports both stateful
and stateless applications by directly connecting persistent storage to Linux containers.
·Security Features: OpenShift provides a secure environment by isolating containers
and applying security policies. Furthermore, it offers detailed access control to
resources through policy-based permissions, ensuring application and data security.
·Automation and Management Features: OpenShift’s master server automatically manages
pods, handling installation, load monitoring, error detection, and general monitoring,
thus reducing the burden on the operational team and ensuring application stability
and availability.
4.4.2 Backup in OpenShift
Backup and restoration in OpenShift are crucial tasks for ensuring the stability of
important data and applications. Backups safeguard data from system failures, data
losses, and user errors, whereas restoration restores data and systems after such
events. Various methods can be employed for backup in OpenShift:
·Backing-up Persistent Volumes: OpenShift uses PV to retain application data. To back
up the PV data, one can either utilize the backup tool of the associated storage backend
or migrate the PV to a different storage class and use its backup feature.
·Backing-up ETCD Database: OpenShift cluster information and configuration are stored
in the ETCD database, a crucial component requiring regular backups. One can be created
ETCD snapshots for backups to ensure data recovery in the case of system failures.
·Application Code Backup: Application code in OpenShift is stored in code repositories,
e.g., Git. The backup of the application code was performed through regular backups
of the Git repository.
·Backup of OpenShift Cluster Configuration: OpenShift cluster configuration is another
important component. Backing up the cluster configuration file ensures the restoration
of the cluster state.
·Backup of OpenShift Console: The OpenShift console, a web interface for cluster management
and monitoring, also requires backups.
4.4.3 Restoration in OpenShift
·Cluster-Level Restoration: Cluster-level restoration is carried out using backup
snapshots of the ETCD database, which contains the cluster’s configuration and state
information. This restoration process returns the cluster back to its previous state.
·Application-Level Restoration: Application data in OpenShift are stored using PV.
Restoration at the application level is performed using the backup data of the PV
either by restoring the data from the storage backend or by migrating the PV to a
different storage class and then restoring the data.
·Restoring Application Code and Configuration: Application code in OpenShift is stored
in code repositories, such as Git. Restoration of the application code and configuration
is achieved by rolling back the code in the Git repository or reverting the cluster
configuration file to a previous version.
4.5 Comparative Analysis of Multicluster Management Platform
We conducted a comparative analysis of multicluster management platforms by considering
eight criteria. The criteria and associated analyses are described as follows:
·Restoration of Application Code and Configuration: In OpenShift, the application
code is stored in code repositories, such as Git, and the cluster configuration is
a crucial aspect. Restoration of the application code and its configuration are achieved
by either reverting the code in the Git repository to a previous state or rolling
back the cluster configuration file to an earlier version.
·Cloud Connectivity: All platforms can connect to various cloud platforms such as
AWS, GCP, and Azure. Such connectivity enhances cloud mobility and offers flexibility
in diverse environments. Cloud integration is an essential aspect of multicluster
management and is indispensable in modern IT environments.
·Backup Location Support: OpenShift supports various backup locations and offers a
range of options through integration with other platforms. Although Rancher supports
backup to S3 and local storage, Kubesphere and Razee are limited in this regard. The
backup location plays a crucial role in DR strategies, and a broader range of choices
offers more flexibility.
·RBAC Support: All platforms support Role-Based Access Control, enabling user-specific
permission management. This strengthens security and enhances team collaboration efficiency.
Support for RBAC is becoming a standard in cluster management today, elevating both
management convenience and security.
·Encryption Support: Authentication and data encryption are fundamentally provided
across all platforms and are particularly essential in environments that focus on
personal information protection and regulatory compliance. Encryption support ensures
data integrity and security, reducing the risk of sensitive information breaches.
·User Interface: Every platform offers support for both the web UI and CLI, providing
a user-friendly interface. The user interface lowers barriers to platform usage and
facilitates convenient management and monitoring.
·Cost: OpenShift incurs costs depending on the enterprise version, whereas the other
platforms are provided as open source. Cost considerations are particularly important
for small-to-medium enterprises and budget-restricted projects. Choosing a solution
that delivers optimal performance at a minimal cost is essential.
·Automatic Recovery Feature: The auto-recovery feature is not supported across all
platforms, emphasizing the need for manual management of the recovery process. As
cluster or application recovery is a critical task, the consideration of such features
is essential when selecting a platform.
Table 2 presents a comparative analysis of multicluster management platforms. In conclusion,
the chosen multicluster management platform must consider users’ specific requirements,
technological capabilities, security needs, and more. Recovery support, backup location,
interface convenience, and costs are significant considerations. Selecting and appropriately
implementing the most suitable tool based on the system’s complexity and requirements
is key to maximizing the operational efficiency. Various elements require careful
evaluation to choose the optimal solution that aligns with the user’s goals.
Table 2. Comparative Analysis of Multicluster Management Platform.
|
Rancher
|
Kubesphere
|
Razee
|
Openshift
|
Application-level Recovery Support
|
Does not support Application-level Recovery
|
Does not support Application-level Recovery
|
Does not support Application-level Recovery
|
Supports Application-level Recovery
|
Cluster-level Recovery Support
|
Supports Cluster-level Recovery
|
Does not support Cluster-level Recovery
|
Does not support Cluster-level Recovery
|
Supports Cluster-level Recovery
|
Supported Cloud Platforms
|
AWS, GCP, Azure, etc.
|
AWS, GCP, Azure, etc.
|
AWS, GCP, Azure, etc.
|
AWS, GCP, Azure, etc.
|
Backup Technology Support
|
Supports S3 and local storage
|
Does not support (Users must perform backups using third-party tools)
|
Does not support (Users must perform backups using third-party tools)
|
Supports backup to S3, block stores, and integrated with other storage solutions
|
RBAC Support
|
Supports multi-user RBAC
|
Supports multi-user RBAC
|
Supports multi-user RBAC
|
Supports multi-user RBAC
|
Encryptrion
|
Supports authentication and authorization, with data encryption
|
Supports authentication and authorization, with data encryption
|
Supports authentication and authorization, with data encryption
|
Supports authentication and authorization, with data encryption
|
User Interface
|
Supports both Web UI and CLI
|
Supports both Web UI and CLI
|
Supports both Web UI and CLI
|
Supports both Web UI and CLI
|
Cost
|
Open Source, Free, and Commercial versions available
|
Open Source, Free
|
Open Source, Free
|
Depends on the subscription (Enterprise versions are not free)
|
Automatic Cluster-level Recovery Support
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Does not support Automatic Cluster-level Recovery
|
Automatic Application -level Recovery Support
|
Does not support Automatic Application-level Recovery
|
Does not support Automatic Application-level Recovery
|
Does not support Automatic Application-level Recovery
|
Does not support Automatic Application-level Recovery
|
5. Future Work
We present future research and development directions for multicluster management.
There is a growing need for a system that integrates multicluster management platforms
with cluster status verification capabilities, as well as backup and restoration tools
with recovery features at both the cluster and application levels and adds automated
recovery functions. This enables automatic recovery at both the cluster and application
levels in the event of a disaster, further enhancing the operational efficiency and
system integrity. With a more detailed analysis and the addition of recovery automation
features, a more robust and trustworthy system can be constructed with integrated
multicluster management and backup and restoration tools. The significance of such
research is considerable given that it can evolve beyond technical limitations to
meet an organization’s strategic objectives.
6. Conclusion
This study conducts a comprehensive comparative analysis of multicluster management
platforms and backup and restoration tools to aid business organizations in adopting
proper DR tools in cloud-native environments. Our analysis goes beyond mere functional
comparisons, leading to proposals for fruitful future research to bridge the gap between
diverse user requirements, including security issues, and existing tools.
ACKNOWLEDGMENTS
This work was supported by an Institute of Information & Communications Technology
Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-02082,
CDM_Cloud: Multicloud Data Protection and Management Platform).
This work was supported by 2023 Hongik University Innovation Support Program Fund.
REFERENCES
G.-Z. Sun, Y. Dong, D.-W. Chen, and J. Wei, “Data Backup and Recovery Based on Data
De-Duplication,” in 2010 International Conference on Artificial Intelligence and Computational
Intelligence, Sanya, China: IEEE, Oct. 2010, pp. 379-382.
P. Menard, R. Gatlin, and M. Warkentin, “Threat Protection and Convenience: Antecedents
of Cloud-Based Data Backup,” Journal of Computer Information Systems, vol. 55, no.
1, pp. 83-91, Sep. 2014,
A. A. Tamimi, R. Dawood, and L. Sadaqa, “Disaster Recovery Techniques in Cloud Computing,”
2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information
Technology (JEEIT), 2019, pp. 845-850,
S. Prakash, S. Mody, A. Wahab, S. Swaminathan, and R. Paramount, “Disaster recovery
services in the cloud for SMEs,” 2012 International Conference on Cloud Computing
Technologies, Applications and Management (ICCCTAM), 2012, pp. 139-144,
S. Suguna and A. Suhasini, “Overview of Data Backup and Disaster Recovery in Cloud,”
in International Conference on Information Communication and Embedded Systems (ICICES2014),
Chennai, India: IEEE, Feb. 2014, pp. 1-7.
“Kubernetes.” (accessed Aug. 28, 2023).
S. De Sameer and R. Prashant Singh, “Selective Analogy of Mechanisms and Tools in
Kubernetes Lifecycle for Disaster Recovery,” in 2022 IEEE 2nd International Conference
on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, Karnataka, India:
IEEE, Dec. 2022, pp. 1-6.
“Velero.” (accessed Aug. 29, 2023).
“K10 Overview—K10 6.0.6 documentation.” (accessed Aug. 29, 2023).
A. A. Tamimi, R. Dawood, and L. Sadaqa, “Disaster Recovery Techniques in Cloud Computing,”
2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information
Technology (JEEIT), 2019, pp. 845-850,
X. Yu, D. Wang, X. Sun, B. Zheng, and Y. Du, “Design and Implementation of a Software
Disaster Recovery Service for Cloud Computing-Based Aerospace Ground Systems,” in
2022 11th International Conference on Communications, Circuits and Systems (ICCCAS),
Singapore, Singapore: IEEE, May 2022, pp. 220-225.
S. Challagidad, S. Dalawai, and N. Birje, “Efficient and Reliable Data Recovery Technique
in Cloud Computing,” Internet of Things and Cloud Computing, vol. 5, no. 1, pp. 13-18,
2017.
J. Yu and L. Yang, “The Cloud Technology Double Live Data Center Information System
Research and Design based on Disaster Recovery Platform,” Procedia Engineering. vol.
174, pp. 1356-1370, 2016.
L. Wang, R. E. Harper, R. Mahindru, and H. V. Ramasamy, “Disaster Recovery for Cloud-Hosted
Enterprise Applications,” the 9th IEEE International Conference on Cloud Computing
(CLOUD), pp. 432-439, 2016.
A. Poniszewska-Marańda and E. Czechowska, “Kubernetes Cluster for Automating Software
Production Environment,” Sensors, vol. 21, no. 5, pp. 1910, Mar. 2021,
“Persistent Volumes,” Kubernetes. (accessed Dec. 04, 2022).
“Portworx Backup Documentation,” (accessed Aug. 29, 2023).
“2. Overview—KubeDR documentation.” (accessed Aug. 29, 2023).
“Rancher Brand Guidelines & Resources,” Rancher Labs. (accessed Dec. 04, 2022).
“Documentation.” (accessed Aug. 29, 2023).
“Razee,” (accessed Aug. 29, 2023).
“Red Hat OpenShift Enterprise Kubernetes Container Platform.” (accessed Aug. 29,
2023).
Jibeom Kim received a Bachelor’s degree in Computer Science and Information Communication
Engi-neering from Hongik University in Sejong, South Korea, in 2022. He enrolled in
a Master’s program in 2022. He researched distributed systems and Kubernetes disaster
recovery, and actively participated in a research project (CDM_Cloud: Multicloud Data
Protection and Management Platform) supported by the IITP (Institute for Information
& Communication Technology Planning & Evaluation).
Eun-Sung Jung received a Bachelor’s degree in Electrical Engineering from Seoul
National University in 1996 and a Master’s degree in Electrical Engineering in 1998.
In 2010, he obtained a Ph.D. in Computer Engineering from the University of Florida.
From 1998 to 2000, he worked as a Research Engineer at LG Industrial Systems, and
from 2000 to 2005, he was a Team Leader at MacroImpact. From 2011 to 2012, he held
the position of Principal Engineer at Samsung Advanced Institute of Technology. Furthermore,
he worked as a PostDoc at Argonne National Lab from 2013 to 2016, and during the summers
of 2016, 2017, 2018, and 2019, he actively participated as a Visiting Faculty in the
Faculty Research Program at Argonne National Lab. Since 2016, he has been employed
as an Associate Professor at Hongik University, Sejong, South Korea. Prof. Jung is
a member of IEICE and a Senior Member of IEEE. In 2011, he served as a journal reviewer
for IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers,
and many other journals.