Mobile QR Code QR CODE

  1. (Department of Software and Communications Engineering, Hongik University / Sejong, Korea {kjbeom, ejung}@hongik.ac.kr )



Disaster recovery, Backup, Restore, Cluster

1. Introduction

Cloud computing environments that provide computing services, such as servers and storage over the internet, offer flexibility and efficiency to numerous businesses and organizations. However, they also risk potential disasters such as data loss, security breaches, and service interruptions. In these situations, data backup for disaster recovery plays a critical role.

Beyond recovering from system errors or hardware failures, data backups provide organizations with several benefits. First, a backup can meet legal obligations to comply with specific laws or regulations regarding data protection and backup in certain industries. Second, backups can play a pivotal role in the face of security threats. The ability to rapidly recover from data security threats, such as ransomware or hacking attacks, ensures that user data are protected, enhancing trust and reliability [1,2]. Quick data recovery meets the Recovery Time Objective and minimizes disruptions to crucial business processes. Third, unforeseen natural disasters such as earthquakes, floods, and fires can strike, in which situations a backup strategy can safeguard the data [3,4].

In conclusion, data backup for disaster recovery in a cloud-computing environment is not merely a mechanism for protecting data. It serves critical functions in various aspects, such as business strategy, legal compliance, security, innovation, and reliability enhancement. Considering these multifaceted aspects, data backup can be deemed an essential component for the successful operation of contemporary organizations, and an efficient backup strategy can support their survival and growth [5]. The environment in which a disaster recovery backup strategy is executed is of utmost importance. Currently, most cloud environments use container-based infrastructures that requires container orchestration tools for effective management. Kubernetes [6], among container orchestration tools, is the most widely used owing to its scalability, automation, and compatibility with various cloud providers. Therefore, Kubernetes plays a crucial role in disaster recovery backup strategies. By employing this tool, a complex infrastructure can be easily managed, offering the flexibility to rapidly recover from failures. Kubernetes can automatically deploy and manage ``containerized'' applications. ``Containerization'' refers to packaging an application and its runtime environment into an isolated unit called a ``container,'' thereby simplifying processes like development, testing, and deployment while also allowing for efficient resource usage.

However, a containerized environment has a complex structure, in which various containers interact. Kubernetes backups must address this complexity and provide stability. Effective backups and restorations of applications, configurations, and data are crucial, and there should be capabilities to recover quickly in case of failure. Moreover, backups via Kubernetes can be consistently carried out not only in cloud environments but also in on-premises, where IT infrastructure is managed in a company’s own data center, local server, or hybrid environments. This simplifies traditional complex backup methods while ensuring cloud flexibility and scalability. Typical tools for disaster recovery in a Kubernetes environment include the ``Kubernetes Backup and Restore Tool,'' which primarily focuses on recovering resources and settings in a single Kubernetes cluster environment, and the ``Recovery Function in a Multicluster Management Platform,'' which centers on recovering resources and settings in environments managing multiple Kubernetes clusters.

Backup and restoration tools for Kubernetes have become increasingly important owing to the growing complexity of cloud-native infrastructure and applications. As complexity increases, so does the risk of failure; hence, such tools can safely preserve the data, configurations, and service states of Kubernetes clusters. With this conserved information, the system can be swiftly and reliably restored upon encountering issues. The key features and functionalities of such systems are as follows:

·Automated Cluster Recovery Support: Not every tool offers automated cluster recovery. Manual intervention or additional automation is necessary in environments requiring this feature.

·Supports Data-Transfer and Restoration between Clusters for business continuity, ensuring high availability and data consistency.

·Various features such as real-time data snapshots, data encryption, and setting up automatic recovery procedures are offered to support data management and security in cloud environments.

Kubernetes backup and restore tools offer flexibility, addressing diverse and complex requirements. These designed features ensure enhanced data stability and service reliability for the users.

Furthermore, the multicluster management platform offers a solution that allows the integrated management of diverse Kubernetes clusters and services, especially as centralized management becomes challenging owing to the increasing complexity of modern cloud services and applications. The primary features and functionalities of this platform include the following:

·Offering consistent management of resource allocation, access rights, and policy settings across multiple clusters to enhance efficiency and flexibility.

·Automating repetitive tasks, such as scaling infrastructure, updating, and monitoring, to reduce operational complexity.

·Automating data synchronization and backups between various clusters and applications to ensure data security and consistency.

The multicluster management platform, in addition to providing integrated management of various clusters, reduces the administrative burden on organizations’ IT infrastructure through several automated features. This ultimately leads to a more effective achievement of business objectives.

While existing studies primarily focus on comparing and evaluating Kubernetes backup and restore tools, our research uniquely analyzes and contrasts the Kubernetes backup and restore tools and the recovery features of multicluster management platforms. Therefore, we provided the specific criteria necessary for tool selection. Our research stands out for addressing both cluster-level and application-level recovery. This is an investigation to validate the feasibility and necessity of developing an automated recovery system by integrating the recovery capabilities of Kubernetes recovery tools and multi-cluster management platforms. Future research will explore the possibility of a system that automatically restores Kubernetes clusters and their internal applications in a multicluster environment by integrating these two types of recovery tools and introducing recovery automation functions.

This paper is structured as follows: Section 2 reviews relevant studies to present the significance of disaster recovery in the Kubernetes environment. Section 3 introduces the functionalities and characteristics of various backup and restoration tools and compares them. Section 4 investigates and compares the recovery features and attributes of the multicluster management platforms. Section 5 discusses directions for future research, focusing on the one that automatically restores both managed clusters and applications. Integrating automatic recovery can overcome the existing system limitations and significantly enhance the efficiency and reliability of data backup and restoration. Finally, Section 6 concludes the study by discussing its expected impact on this field of research.

2. Related Work

Sammer [7] conducted a thorough analysis and comparison of the mechanisms and tools necessary for the backup and restoration of systems based on the Kubernetes. Their study particularly delves deep into the practicality and efficiency of two primary backup and restoration tools, namely Velero [8] and Kasten K10 [9]. Both tools were evaluated on various fronts, including different backup strategies, volume snapshot functionalities, cloud compatibility, and user-permission settings (RBAC).

Tamimi [10] provided a detailed account of the selection, implementation, and application of backup and restoration tools in real-world industrial settings. Their study describes various strategies and configurations that enable continuous data protection and swift restoration. Furthermore, the study extensively considers several technical factors when choosing backup and restoration tools, including data encryption, access control, compatibility with various cloud service providers, and locations for storing backup data. This comprehensive analysis proposes criteria for selecting most efficient and secure backup and restoration solution. Their paper also examined how the lifecycle of Kubernetes and the backup and restoration process interact, assisting DevOps teams in selecting and implementing the appropriate backup and restoration strategy at various system stages. This is particularly valuable when developing and maintaining deployment models for cloud-native applications and APIs.

Yu [11] focused on application recovery automation in cloud-computing-based aerospace ground systems. He developed and presented a recovery service for software-failure scenarios. The recovery service mainly emphasized application recovery automation, aiming to bolster stability and availability in a cloud-computing environment. This paper delves into the technical specifics related to various software recovery strategies and presents experimental results evaluating software recovery time and capability. An automatic recovery feature was proposed and designed for applications in response to software failures in a cloud-computing environment, offering recovery strategies for diverse application scenarios. The study verified the efficacy and performance of these recovery strategies through actual experiments and introduced an automated method to maintain the persistence of access portals after application recovery, ensuring the continuity of upstream business without interruptions. Users can continue using the service without considering the recovery process. This research differentiates itself by its focus on application recovery automation in a cloud environment, whereas most previous studies [12-14] were primarily concentrated on data and system recovery. Thus, it offers measures for enhancing the stability and availability of applications in cloud environments.

Each of these studies provided an in-depth analysis of backup and restoration strategies in Kubernetes and cloud-computing environments from various perspectives. Sammer [7] contrasted the practicality and efficiency of notable Kubernetes backup and restoration tools, such as Velero and Kasten K10, evaluating multiple backup strategies and features. Tamimi [10] offered comprehensive guidelines for selecting and implementing backup and restoration tools and analyzed their feasibility in real-world business settings. Yu [11] focused on application recovery in cloud-based aerospace ground systems and suggested recovery strategies and automated methods at the application level. All the studies collectively explored diverse methods to enhance stability and availability in their respective domains. A holistic analysis of these studies assists in determining optimal backup and restoration strategies tailored to various requirements and settings.

3. Kubernetes Backup and Restoration Tools

Kubernetes backup and restoration tools offer functionalities essential for safely managing the data and system state of containerized applications [15]. Data are one of the core assets of today’s enterprises and organizations, and their protection and management carry an immense responsibility. Improperly managed data can lead to failures and can significantly impact business continuity. The use of these tools plays a pivotal role in securely backing-up critical data and settings and ensuring quick restoration in the event of failures or data loss.

Kubernetes, owing to its intricate and dynamic structure, often finds traditional backup methods insufficient. Particularly in distributed environments, such as microservice architecture, maintaining data consistency becomes even more imperative and complex. In such scenarios, Kubernetes backup and restoration tools continuously monitor the resources and states within the cluster, performing regular or ad-hoc backups to enable swift recovery upon failure. Furthermore, these tools offer advanced functionalities, such as data management in multicluster or multicloud environments, backup scheduling, and policy-based management, allowing for flexible responses in complex operational scenarios.

With the mainstream adoption of cloud computing and containerization technologies, the significance of data backup and restoration tools has increased. Kubernetes backup and restoration tools have cemented their place as crucial components of modern IT infrastructure, supported data availability, integrity, and security while concurrently catered to business requirements and regulatory compliance.

In the subsequent subsections, we investigated into a detailed description and comparative analysis of notable Kubernetes backup and restoration tools such as Velero, Kasten K10, Portworx Backup, and KubeDR. We explored the distinct features and functionalities of each tool. Additionally, by comparing the features of each tool, an analysis is provided regarding which tool may be more appropriate for users. This will assist users in selecting the most suitable backup and restoration tool for their Kubernetes environment.

3.1 Velero

Velero, also known by its previous name Heptio Ark, facilitates the backup and restoration of Kubernetes cluster resources and persistent volumes [16]. Velero can be operated in both cloud service providers and on-premises environments. It can perform restorations at both the cluster and application levels. This allows the recovery of an entire cluster or, if needed, recovery at specific applications or namespace levels. Velero comprises a server running inside the cluster and a command-line client running locally. These include controllers that manage the Kubernetes custom resources for backup, restoration, and related operations. Therefore, Velero is equipped with the ability to backup or restore an entire cluster or objects filtered by type, namespace, or label. It is considered an ideal tool not only for disaster recovery use cases, but also for snapshotting application states before performing system operations on a cluster.

3.1.1 Backup in Velero

Velero manages and creates backups for the data and resources of a Kubernetes cluster. The types of backup include the following:

·On-demand backup: Users can manually initiate backup as required. This operation preserves the state of the Kubernetes cluster, allowing the restoration of the entire or parts of the cluster when needed.

·Scheduled backup: Velero conducts automatic backups such as cronjobs. This ensures that users maintain consistent snapshots of the Kubernetes cluster, which is highly beneficial for regular inspections and disaster recovery scenarios.

The backup flow begins by executing the ``velero backup create'' command, which interacts with the Kubernetes API server. The BackupController acknowledges the newly created backup object and conducts a validation. The data are then fetched by querying the API server, and the backup files are subsequently uploaded to the object storage service. By default, Velero creates disk snapshots for all the persistent volumes. This snapshot creation can be adjusted with specific flags and deactivated using the ``--snapshot-volumes=false'' option.

Fig. 1 illustrates the backup flow of Velero. In Step 1, the Velero user creates a backup custom resource (CR) by calling the Kubernetes API server. In Step 2, the backup controller is set to monitor the newly created backup CR from the Kubernetes API server. In Step 3, the backup controller queries the Kubernetes API server to collect data for backup. The backup controller invokes the cloud storage service to upload the backup file in Step 4, and in Step 5, it generates the disk snapshot of the Persistent Volume (PV).

Fig. 1. Backup flow of Velero.
../../Resources/ieie/IEIESPC.2024.13.5.499/fig1.png

3.1.2 Restoration in Velero

Velero restores the data and resources from previous backups. It can restore all objects and PVs or use filters to specify a subset of objects and volumes that one wants to restore. This accommodates various scenarios that can arise in Kubernetes clusters.

By default, Velero conducts nondestructive restorations, meaning that it does not delete data in the target cluster. If a resource from the backup already exists in the target cluster, it is skipped. However, users can utilize the ``--existing-resource-policy'' restore flag to configure Velero to update existing resources to match those in the backup.

The restoration flow starts by executing the ``velero restore create'' command, in which the client sends a request to the Kubernetes API server to produce a restore object. Subsequently, the RestoreController recognizes the new restore object and validates it. After retrieving the backup details from remote storage, it preprocesses the backup resources. This step is crucial for ensuring that the restored resources function properly in the cluster. Subsequently, the resources are individually restored.

Fig. 2 illustrates the restoration flow of Velero. In Step 1, the Velero user initiates a restore CR by calling the Kubernetes API server. In Step 2, the restore controller detects the newly created restore CR and monitors it along with the Kubernetes API server. In Step 3, the restore controller fetches backup details from the cloud storage service and then preprocesses the backed-up resources to verify that they operate within the cluster. The restore controller sends a request to the Kubernetes API server to initiate the restoration process in Step 4.

Fig. 2. Restoration flow of Velero.
../../Resources/ieie/IEIESPC.2024.13.5.499/fig2.png

3.2 Kasten K10

Kasten K10 is a data management and protection platform optimized for modern cloud-native environments and Kubernetes. It offers automated workflows for various functions, such as migration, disaster recovery, backup, and data protection, to manage an application’s lifecycle. These features are easily configurable through an intuitive user interface, and provide flexibility through integration with various cloud environments and storage backends. Such multifunctionality reduces the complexity of data managementwhen implementing a secure protection mechanism.

3.2.1 Backup in Kasten K10

The features and characteristics provided by Kasten K10 include the following:

·Policy-based Backup: Kasten K10 supports policy-based backup management. Users can define backup policies that include parameters such as the backup frequency, retention period, and target storage location. Once set, the system automatically performs backup operations based on these policies.

·Application-centric Backup: Rather than just backing-up data, Kasten K10 also backs up related Kubernetes configurations, storing information, and all metadata associated with the application. This ensured a more accurate restoration of the entire application state.

·Support for Various Storage Backends: Kasten K10 supports backups for various storage solutions and cloud environments. Users can choose the storage backend best suited to their environment to store the backup data.

·Data Efficiency: To store backup data efficiently, Kasten K10 applied deduplication, compression, and efficient data-transfer methods. This promotes storage-space optimization and faster backup operations.

·Reliability and Robustness: The backup process of Kasten K10 is designed to be robust, supporting features such as retrying interrupted backups and resuming data transfers during network instability. This ensures the reliability of the backup operations, which is particularly crucial when backing-up large datasets.

·Encryption: Kasten K10 applies encryption to backup data for security, preventing unauthorized access to or leakage of data containing sensitive information.

·Scalability: Kasten K10’s backup architecture is designed considering the scalability of cloud-native environments, ensuring fast and efficient backup operations even in large and complex cluster settings.

3.2.2 Restoration in Kasten K10

The main features and details of Kasten K10’s restoration are as follows:

·Instant Recovery: With Kasten K10’s instant recovery feature, users can quickly recover the restoration point. Instant recovery can be performed much faster than regular restoration, significantly contributing to business continuity.

·Resource Transformation: Kasten K10 supports the transformation of Kubernetes resources during the restoration process. For instance, when restoring a recovery point created from one cloud provider to a cluster from another cloud provider, resources such as container image URLs or storage-class configurations can be modified.

·Cloning Applications: Kasten K10 provides a feature to restore applications to a target namespace that is different from the namespace of the original application. This is useful for extracting specific files or parts of the original data or for cloning an application for debugging or testing/development purposes.

·Persistent Volume Claim (PVC) Name Alteration: Kasten K10 offers the option of changing the name of the PVCs during restoration. This allows the renaming of PVCs depending on workload configurations, and this feature is also used in the StatefulSet and DeploymentConfigs settings.

·Use of Alternative Location Profiles: Users can select an exported restoration point to restore applications from locations outside a cluster. This is difficult to achieve when the restoration point is copied or moved to a different location.

·Application-centric Restoration: Kasten K10 restores application data along with all metadata of the application captured during a backup. This ensures accurate restoration of the application state and prevents any interruption to the continuity of the application.

·Restoration Flexibility: Kasten K10 provides various options and configurations for restoration operations. This flexibility supports a range of restoration scenarios and requirements, allowing the selection of the most suitable restoration strategy in specific situations.

3.3 Portworx Backup

Portworx Backup [17] is a backup and restoration solution tailored for the Kubernetes environment. This tool restores the applications and data across multiple clusters. Notably, with the namespace and label selector features, users can produce granular backups, even allowing backups of only specific resources of interest. This ensures consistency in the associated configurations and pod data during backup. Integrated with Portworx Central, it facilitates the management of multiple clusters and their backups from a single UI, aiding users in easily managing backups for resources they have permission to access, even in multi-user environments.

3.3.1 Backup in Portworx Backup

Portworx’s Backup capabilities offer the following diverse features and options:

·Selective Backup: Portworx Backup offers users detailed backup options. Using namespaces and label selectors, users can precisely specify the resources to be backed up. For example, one can conduct a backup operation targeting only MySQL pods, PVCs, and volumes with the ``app=mysql'' label. Such fine-grained selections backup only the necessary resources, conserving storage-space and shortening restoration times.

·Scheduling and Automation: Portworx Backup supports the automation of backup tasks through scheduling policies. Users can set up backups to occur at specific times or intervals, allowing off-hour backups or running backup tasks during low-traffic times, such as weekends, to minimize system impact.

·Variety of Storage Options: When choosing a storage location for backup data, Portworx Backup offers a range of choices. It supports major object storage such as AWS S3, Azure Blob Storage, and Google Cloud Storage, as well as options such as Portworx PX-Store. This flexibility allows users to select optimal storage, considering factors such as availability, cost, and region.

·Integrated Resource Backup: Beyond data, Portworx Backup also encompasses various Kubernetes resource types within its backup purview. It provides comprehensive backup from core resources such as PV, Deployment, and Service to configuration and authentication details such as ConfigMap and Secret.

·Backup Rule System: A rule system is provided that automatically executes certain tasks or commands before and after a backup. For instance, automating tasks, such as pausing specific services before a backup, or sending notifications after a backup can enhance the accuracy and efficiency of the backup process.

3.3.2 Restoration in Portworx Backup

Portworx Backup’s restoration capabilities offer the following diverse features and options:

·Original Restoration: Users can restore the backup data to their original cluster. This is particularly useful when there is data loss or issues with specific resources within a cluster, and there is a need to restore that resource to its original state.

·Restoration to Another Cluster: If required, backup data can be restored to different Kubernetes clusters. This can be leveraged for purposes such as testing in different environments or data migration.

·Restoration to a New Namespace: It is possible to restore data within the same cluster but at a different namespace. This can be employed for rapid service recovery in situations in which a specific namespace encounters issues.

·Namespace-based Restoration: Users can choose to restore only data corresponding to a specific namespace from an entire backup dataset. This is useful when there is a desire to individually restore data for a specific application.

·Restoration through Label Selectors: Data can be filtered and restored based on the labels. Through this feature, associated groups of resources can be effectively restored.

·Considering Resource Dependencies: The restoration process is performed by considering the dependencies of the backed-up resources. For instance, resources such as PVC, Deployment, ConfigMap, and Service related to a PV are restored sequentially based on their dependencies, ensuring their proper function after restoration.

·Direct Restoration Method: Portworx Backup restores data directly from the original storage location where it is backed up, without moving it elsewhere. This enhances the restoration speed and reduces the costs associated with data movement.

·Command Execution: Users can set up rules to execute specific commands before and after the restoration process. This allows the automation of additional tasks, such as verifying existing data before restoration or sending notification messages after restoration.

3.4 KubeDR

KubeDR [18] is a tool designed to protect crucial data within Kubernetes clusters. Primarily, it backs up the Kubernetes objects stored in the ETCD and optionally certificates them to an S3 bucket. KubeDR follows the widely adopted ``Operator Pattern'' in Kubernetes, which consists of a combination of CR and their corresponding controllers. Each CR has its own unique controller that performs data backup and recovery operations. The KubeDR operator uses webhooks to verify data validity and, if necessary, sets default values for resource specifications. By doing so, KubeDR enhances the data integrity and disaster recovery capabilities of Kubernetes clusters.

The key features of KubeDR include the following:

·Backup of ETCD Data and Certificates to S3: KubeDR backs up ETCD data and cluster certificates to an S3 bucket. This backup securely protects vital cluster information.

·Backup Encryption and Deduplication: The backup is encrypted for security purposes, and redundant data are eliminated to conserve the storage space.

·Backup Pause and Resume Functionality: KubeDR offers the capability to pause and resume backup operations, thereby enabling backup adjustments during specific periods.

·Control of Retained Backups via Retention Settings: Users can determine the backup retention duration and decide how long to retain backups.

Because KubeDR requires direct access to the ETCD, it operates only in clusters with accessible ETCD and the capability to take snapshots. This includes on-premise clusters and cloud environments configured explicitly for computing instances.

3.4.1 Backup in KubeDR

Kubernetes stores all cluster data in a distributed store called ETCD, making it imperative to periodically back up the ETCD data for disaster recovery (DR) and data loss prevention.

KubeDR performs backups based on the following features:

·Backup Target Specification: The backup target was set as an S3 bucket. Users define the backup target by creating a Kubernetes Custom Resource called Backup Location. Therefore, KubeDR backs up data to the designated S3 bucket.

·Credential Management: Certificates and credentials used to access S3 for backup are stored in Kubernetes ``secret'' resources. This secret includes the authentication keys required for the S3 access and passwords for encryption.

·Data Encryption: To ensure data protection, KubeDR encrypts the backup data. The encrypted backup data, stored in the S3 bucket, offers enhanced security against data breaches.

·Deduplication: Redundant backup data are eliminated to save storage space, prevent the backup of identical data blocks multiple times, and ensure efficient storage use.

·Pause and Resume Functionality: Users can pause or resume backup operations as required, allowing control over backup operations at specific times or in exceptional circumstances.

·Retention Policy Setup: Users can set a backup retention period, allowing backups to be stored for specified durations. This feature facilitates the automatic deletion or retention of old backup data.

3.4.2 Restoration in KubeDR

KubeDR supports two types of restoration: DR restoration and Regular Restore.

DR restoration is used to configure a new cluster after a master node has been lost. To perform DR, the following steps were performed:

·Backup Browsing: Users review the backup snapshots in the target S3 bucket and select the snapshot ID to restore from.

·Restoration: Using the kubedctl command or Docker command, the data were restored using the chosen snapshot ID. The restored data consist of ETCD snapshot files and optionally certified files. With the restored data, a new cluster was configured, and the data were restored.

Regular Restoration is employed when the cluster is operational; however, access is required to certificate or ETCD snapshots. To perform a regular restoration, the following steps are taken:

·Backup Browsing: After each successful backup, KubeDR creates a resource named MetadataBackupRecord. Through this resource, all backup snapshots are listed chronologically, allowing for the selection of a backup to be restored from.

·Restoration Settings: Users create a Persistent VolumeClaim (PVC) to define the source and target for the restoration. PVC is the Kubernetes resource used to store the data to be restored.

·Initiate Restoration: By creating a MetadataRestore resource, the restoration operation is triggered. This allows KubeDR to restore the data from the backup snapshot and save it to the PVC.

3.5 Comparative Analysis of Backup and Restoration Tools in Kubernetes

We conducted a comparative analysis of the backup and restoration tools by considering ten criteria. The criteria and associated analyses are as follows.

·Application and Cluster Recovery Support: Velero uniquely supports both application and cluster-level recovery. This support ensures the application integrity and cluster stability in complex environments. In contrast, Kasten K10 and Portworx Backup only support application recovery, which may not be suitable in situations where cluster recovery is required. KubeDR exclusively supports cluster recovery, making it less likely when application recovery is crucial.

·Cloud Connectivity: Most backup and restoration tools can connect with various cloud platforms, such as AWS, GCP, and Azure. This connectivity boosts cloud-to-cloud mobility and provides flexibility in diverse environments.

·Backup Location Support: The backup location refers to the physical space in which data are stored and is central to the backup strategy. Velero, Kasten K10, and Portworx Backup can backup to both S3 and block stores, with Kasten K10 additionally supporting file store backups. However, KubeDR was limited to backing up only to S3. Support for backup locations plays a crucial role in determining data accessibility and integrity, particularly when formulating DR strategies.

·Role-Based Access Control (RBAC) Support: RBAC is a crucial feature for enhancing security, which is only supported by Kasten K10 and Portworx Backup. The lack of this security feature in Velero and KubeDR can increase the risk associated with data access.

·Encryption Support: Most tools support authentication and data encryption, thereby reinforcing data security. This is particularly significant in environments focusing on personal data protection and regulatory compliance.

·User Interface: Kasten K10 and Portworx Backup offer user-friendly experiences with their web UIs. Velero and KubeDR only support CLI, making them potentially unsuitable for those seeking a more intuitive interface.

·Cost: Velero and KubeDR are available for free and offer cost-effective solutions. Kasten K10 and Portworx Backup require paid licenses, necessitating budget considerations.

·Automatic Cluster Recovery Support: No tool supports automatic cluster recovery, such that manual intervention or additional automation is required for environments requiring this feature.

·Automatic Application Recovery Support: Kasten K10 supports automatic application recovery, although its setup can be complex, and its functionalities are limited. This might introduce challenges during the initial setup and may not meet specific requirements.

Table 1 summarizes the descriptive comparative analysis of Kubernetes backup and restoration tools. This indicates that the tool one should opt for requires a holistic consideration of specific user requirements, budget, technical proficiency, security needs, etc. In particular, the needs for cluster and application recovery, as well as budget considerations, are paramount.

In conclusion, selecting and adequately implementing the most suitable tool according to the system complexity and requirements maximizes data protection and operational efficiency. Carefully weighing the various factors ensures that the chosen solution aligns with the user’s goals.

Table 1. Comparative Analysis of Backup and Restoration tools in Kubernetes.

Velero

Kasten K10

Portworx Backup

KubeDR

Application-level Recovery Support

Supports Application-level Recovery

Supports Application-level Recovery

Supports Application-level Recovery

Does not Supports Application-level Recovery

Cluster-level Recovery Support

Supports Cluster-level Recovery

Does not Supports Cluster-level Recovery

Does not Supports Cluster-level Recovery

Supports Cluster-level Recovery

Supported Cloud Platforms

AWS, GCP, Azure, etc.

AWS, GCP, Azure, etc

AWS, GCP, Azure, etc

AWS, GCP, Azure, etc/

Backup Technology Support

Supports backup to S3 and block stores

Supports backup to S3 and block stores and file stores

Supports backup to S3 and block stores

Supports backup to S3 only

RBAC Support

Does not support multi-user RBAC

Supports RBAC

Supports RBAC

Does not support RBAC

Encryptrion

Supports authentication and authorization and optional data encryption

Supports authentication and authorization and data encryption

Supports authentication and authorization and data encryption

Supports authentication and authorization

User Interface

Supports CLI only

Supports both Web UI and CLI

Supports both Web UI and CLI

Supports CLI only

Cost

Open Source, Free

Requires License, Paid

Requires License, Paid

Open Source, Free

Automatic Cluster-level Recovery Support

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Automatic Application-level Recovery Support

Does not support Automatic Application-level Recovery

Supports for automation through user configurations

Does not support Automatic Application-level Recovery

Does not support Automatic Application-level Recovery

4. Multicluster Management Platform

A multicluster management platform provides an environment in which various clusters and services are integrated and managed. As the complexity of cloud-based services and applications increases, consistent management and deployment across multiple clusters become essential.

The multicluster management platform enhances efficiency and flexibility in such intricate settings, reducing operational burdens through resource optimization and automation. As the number of clusters to be managed increases, the error probability increases, and maintenance and monitoring become challenging without a centralized management platform.

The adoption of a multicluster management platform simplifies the arduous task of manually managing individual clusters, supporting consistent policy application, monitoring, and stable deployments. Consequently, businesses and organizations leverage these platforms to boost the efficiency of their IT infrastructure and reinforce the competencies that are crucial for achieving business objectives.

The following subsections provide a detailed description and comparative analysis of leading multicluster management platforms, such as Rancher, Kubesphere, Razee, and OpenShift. We examined the unique features and functionalities of each tool. Moreover, by comparing the functionalities of each tool, we provide an analysis to determine the most suitable ones for users. Thereby, users can gain insight into selecting the most suitable multicluster management platform for their Kubernetes environment.

4.1 Rancher

Rancher [19] is a container management platform that operates Kubernetes clusters in on-premise, cloud, or edge environments. Ranchers are ideal for multicluster, hybrid, or multicloud container scenarios. It centralizes authentication and RBAC for all clusters, enabling global administrators to control cluster access from a single location. Rancher offers the capability of importing and managing clusters through a single interface, implementing consistent security policies, monitoring logs, and overseeing all performances.

4.1.1 Architecture of Rancher

Fig. 3 illustrates the structure of a Rancher server installation managing one Kubernetes cluster configured with the Rancher Kubernetes Engine and another Kubernetes cluster configured with the Amazon Elastic Kubernetes Service. The description of the functions of each Rancher server component is as follows:

Fig. 3. Architecture of Rancher.
../../Resources/ieie/IEIESPC.2024.13.5.499/fig3.png

·Authentication Proxy (Auth Proxy): The Auth Proxy is integrated with Rancher’s authentication services to manage user authentication. Before passing the Kubernetes API calls to the underlying clusters, it authenticates the caller and sets the appropriate Kubernetes impersonation headers to securely relay the request. Users can access the resources in the underlying clusters through an Authentication Proxy using either the Rancher UI or Kubectl commands.

·Rancher API Server: This is the central component of Rancher, responsible for managing Rancher resources, such as clusters, projects, users, and apps. Through interfaces such as the Rancher UI, users can perform cluster-related operations using the Rancher API.

·Cluster Controller: The Cluster Controller monitors state changes for each underlying cluster and manages cluster resources to transition them to the desired state. It also configures access control policies for clusters, projects, and provisional clusters. Each underlying cluster has a Cluster Controller that reports the state of the cluster back to the Rancher server.

·Cluster Agent: Operating within the underlying cluster, this Rancher component communicates with Rancher and manages resources within the cluster. The Cluster Agent connects to the Kubernetes API of Rancher-launched Kubernetes clusters. It oversees the management of workloads, creation, and deployment of pods, policy application, and communicates events, metrics, node information, and the state between the cluster and Rancher server. Each underlying cluster has one Cluster Agent that connects to the Cluster Controller.

Each of these components supports Rancher functionalities and plays an essential role in cluster management. The Authentication Proxy is responsible for user authentication, the Rancher API Server provides core Rancher functions, and the Cluster Controller and Cluster Agent monitor and manage the underlying clusters.

4.1.2 Backup and Restoration of Rancher

The backup and restoration functionalities of Rancher are composed of the following elements:

1. Backup Creation

·Periodic Backup: Rancher takes periodic snapshots based on a user-defined schedule. These backups preserve the critical states of the Rancher Server and cluster data, thereby facilitating DR.

·One-time Backup: Users can manually initiate backups, as needed. This is beneficial for taking additional backups before significant changes occur or in anticipation of unforeseen events.

2. Backup Configuration

·Snapshot Contents: The backup includes data from the Rancher Server, as well as the state and configuration information of the Kubernetes cluster. This allows users to complete the restoration of the Rancher system.

·Diverse Options: During backup, users can select only the data that they require. For instance, they can choose to backup only the ETCD data, enabling the recovery of vital cluster information.

3. Backup Storage

·Local Storage: By default, backup data are stored on the local disk of the node where the Rancher Server runs. Local backups offer a simple configuration and facilitate swift backups and restoration.

·S3-Compatible Storage: Instead of local storage, users can configure S3-compatible storage to remotely store backup data. This ensures that, even if all ETCD nodes are lost, the cluster can still be restored using remote snapshots.

4. Restoration

·Snapshot-based Restoration: Users can restore both the Rancher Server and cluster using the created backup snapshots. This allows users to quickly revert the system to its original state in the event of unexpected disruption or data loss.

·Detailed Restoration Options: If needed, users can opt to restore only ETCD data or include Kubernetes versions and cluster configuration information. This is useful for restoring specific parts of the cluster.

Although Rancher’s backup and restoration functions do not directly support application-level backup and restoration, they reliably manage the Rancher system itself, ensuring the safety and availability of applications. Through backup and restoration, users can prepare for potential disaster scenarios and swiftly restore their clusters to their original states. This enables users to utilize a secure and reliable Kubernetes environment based on the Rancher.

4.2 KubeSphere

KubeSphere [20] is an enterprise-grade container management platform that leverages Kubernetes and cloud-native technologies based on open-source code. This platform enables the easy use of Kubernetes, even in complex cluster environments, and effectively supports the deployment, operation, and management of containerized applications. It provides all the tools and features required to integrate and manage Kubernetes clusters in various cloud and on-premise environments, allowing for the efficient allocation and central monitoring of resources.

4.2.1 Key Features of KubeSphere

KubeSphere offers a range of functionalities including the following:

·Multitenancy: Allows various teams or projects to share a single Kubernetes cluster while operating independently. This facilitates optimal resource utilization, and each team or project can work autonomously within their namespaces.

·DevOps Support: KubeSphere provides features for continuous integration (CI) and Continuous Deployment (CD). This allows developers and operation teams to manage the entire lifecycle of an application efficiently. Such DevOps capabilities automate the process from development to deployment, enabling swift and stable software rollouts.

In addition, KubeSphere provides a plethora of features, such as inter-cluster networking, service mesh, high scalability, and support for various storage and network plugins. Through these capabilities, users can simplify intricate tasks related to Kubernetes and effectively oversee the cluster operations.

4.2.2 Backup and Restoration in KubeSphere

KubeSphere does not offer backup or restoration functionalities directly. Instead, it supports the integration of separate backup and restoration tools. For example, it can be used in conjunction with Velero to enable backup and restoration of both clusters and applications.

4.3 Razee

Razee [21] is an open-source project developed by IBM. It was designed to automate the deployment and management of Kubernetes resources across multicluster environments. Razee addressed these requirements, recognizing the need for Kubernetes to expand consistently across multiple clusters, environments, and cloud providers.

Razees comprise three primary modules, each designed to cater to various needs and situations within a cluster. These loosely connected modules are RazeeDash, RazeeDeployables, and RazeeDeploy, which allow for independent usage based on necessity.

RazeeDash focuses on visualizing deployment information by dynamically generating a real-time inventory of the Kubernetes resources. In turn, RazeeDeploy and RazeeDeployables emphasize the efficient templating of Kubernetes resources and the automation of resource deployment and management in multicluster environments.

Through this configuration, Razee offers an integrated platform for businesses and organizations to simplify resource deployment and management across clusters while also providing clear insights into cluster status and deployment conditions.

4.3.1 Key Features of Razee

Razees offer several features as follows:

·Cluster Inventory Management: By utilizing RazeeDash and Watchkeeper, users can add clusters to the Razee inventory list. This feature allows for the monitoring of Kubernetes resource deployment statuses using intelligent filters and alerts, providing real-time visualization and management of each cluster’s current state and configuration information.

·CD Across Clusters and Environments: Using RazeeDeployables, users can control and automate their Kubernetes resource deployment across clusters and environments. They can leverage this by adding all clusters to the Razee inventory and subscribing clusters to publishing channels that contain the desired Kubernetes resource versions.

·Templating Kubernetes Resources: RazeeDeploy includes custom resource definitions (CRDs) that assist users in dynamically generating Kubernetes resources based on set feature flags or variables. This helps group resources and automatically applies them to clusters.

·RemoteResource & RemoteResourceS3: These CRDs and controllers are utilized to automatically deploy Kubernetes resources stored in source repositories.

·MustacheTemplate: A CRD and controller defining environment variables that can replace specific parts within other Kubernetes YAML files.

·FeatureFlagSetLD: A CRD and controller that automatically fetches feature flag values from Launch Darkly. This enables users to control the code deployed to clusters and manage multiple versions of Kubernetes resources across various clusters, environments, or clouds.

·ManagedSet: A CRD and controller grouping Kubernetes resources intended for simultaneous creation and application to clusters.

4.3.2 Backup and Restoration in Razee

Primarily, a tool specialized for the deployment and management of Kubernetes resources, Razee, does not inherently offer backup and restoration functionalities. However, by utilizing open-source tools, such as Velero, it is possible to securely backup and restore Kubernetes resources and data. By integrating these tools with Razee, resources deployed via Razee can be safely archived and subsequently restored. Beyond the functionalities of Razee, integration with the Kubernetes backup and restoration tools is essential to enhance cluster stability and data protection. Through such integration, the security and protection of data in the clusters can be further reinforced.

4.4 OpenShift

OpenShift [22] is a comprehensive container application platform that inherently integrates the Kubernetes container cluster management and orchestration system and multicluster management technology, all combined on an enterprise foundation of Red Hat Enterprise Linux.

OpenShift facilitates the rapid construction, development, and deployment of applications on almost any infrastructure type, from public to private clouds and on premise, without being restricted to specific application architectures. Enterprises can rapidly commercialize their ideas owing to this flexibility. In essence, OpenShift provides an integrated operational environment supporting Kubernetes by employing Docker containers and DevOps tools.

4.4.1 Key Features of OpenShift

The primary features of OpenShift include the following:

·Container Orchestration: Based on Kubernetes, OpenShift automates the placement, scheduling, and management of containers. Through container orchestration, it supports the automated deployment and scaling of applications, maintaining their availability and performance.

·CI/CD Pipeline: When developers commit an application code to a version control system within OpenShift, it is automatically built, tested, and deployed, enabling continuous integration and deployment and fostering collaboration between development and operations teams.

·Multicloud Support: OpenShift offers a consistent experience across various environments, including public clouds, on-premises, hybrid clouds, and edge architectures. Users can deploy applications flexibly and efficiently across multiple clouds and on-premise resources.

·Support for Stateful and Stateless Applications: OpenShift supports both stateful and stateless applications by directly connecting persistent storage to Linux containers.

·Security Features: OpenShift provides a secure environment by isolating containers and applying security policies. Furthermore, it offers detailed access control to resources through policy-based permissions, ensuring application and data security.

·Automation and Management Features: OpenShift’s master server automatically manages pods, handling installation, load monitoring, error detection, and general monitoring, thus reducing the burden on the operational team and ensuring application stability and availability.

4.4.2 Backup in OpenShift

Backup and restoration in OpenShift are crucial tasks for ensuring the stability of important data and applications. Backups safeguard data from system failures, data losses, and user errors, whereas restoration restores data and systems after such events. Various methods can be employed for backup in OpenShift:

·Backing-up Persistent Volumes: OpenShift uses PV to retain application data. To back up the PV data, one can either utilize the backup tool of the associated storage backend or migrate the PV to a different storage class and use its backup feature.

·Backing-up ETCD Database: OpenShift cluster information and configuration are stored in the ETCD database, a crucial component requiring regular backups. One can be created ETCD snapshots for backups to ensure data recovery in the case of system failures.

·Application Code Backup: Application code in OpenShift is stored in code repositories, e.g., Git. The backup of the application code was performed through regular backups of the Git repository.

·Backup of OpenShift Cluster Configuration: OpenShift cluster configuration is another important component. Backing up the cluster configuration file ensures the restoration of the cluster state.

·Backup of OpenShift Console: The OpenShift console, a web interface for cluster management and monitoring, also requires backups.

4.4.3 Restoration in OpenShift

·Cluster-Level Restoration: Cluster-level restoration is carried out using backup snapshots of the ETCD database, which contains the cluster’s configuration and state information. This restoration process returns the cluster back to its previous state.

·Application-Level Restoration: Application data in OpenShift are stored using PV. Restoration at the application level is performed using the backup data of the PV either by restoring the data from the storage backend or by migrating the PV to a different storage class and then restoring the data.

·Restoring Application Code and Configuration: Application code in OpenShift is stored in code repositories, such as Git. Restoration of the application code and configuration is achieved by rolling back the code in the Git repository or reverting the cluster configuration file to a previous version.

4.5 Comparative Analysis of Multicluster Management Platform

We conducted a comparative analysis of multicluster management platforms by considering eight criteria. The criteria and associated analyses are described as follows:

·Restoration of Application Code and Configuration: In OpenShift, the application code is stored in code repositories, such as Git, and the cluster configuration is a crucial aspect. Restoration of the application code and its configuration are achieved by either reverting the code in the Git repository to a previous state or rolling back the cluster configuration file to an earlier version.

·Cloud Connectivity: All platforms can connect to various cloud platforms such as AWS, GCP, and Azure. Such connectivity enhances cloud mobility and offers flexibility in diverse environments. Cloud integration is an essential aspect of multicluster management and is indispensable in modern IT environments.

·Backup Location Support: OpenShift supports various backup locations and offers a range of options through integration with other platforms. Although Rancher supports backup to S3 and local storage, Kubesphere and Razee are limited in this regard. The backup location plays a crucial role in DR strategies, and a broader range of choices offers more flexibility.

·RBAC Support: All platforms support Role-Based Access Control, enabling user-specific permission management. This strengthens security and enhances team collaboration efficiency. Support for RBAC is becoming a standard in cluster management today, elevating both management convenience and security.

·Encryption Support: Authentication and data encryption are fundamentally provided across all platforms and are particularly essential in environments that focus on personal information protection and regulatory compliance. Encryption support ensures data integrity and security, reducing the risk of sensitive information breaches.

·User Interface: Every platform offers support for both the web UI and CLI, providing a user-friendly interface. The user interface lowers barriers to platform usage and facilitates convenient management and monitoring.

·Cost: OpenShift incurs costs depending on the enterprise version, whereas the other platforms are provided as open source. Cost considerations are particularly important for small-to-medium enterprises and budget-restricted projects. Choosing a solution that delivers optimal performance at a minimal cost is essential.

·Automatic Recovery Feature: The auto-recovery feature is not supported across all platforms, emphasizing the need for manual management of the recovery process. As cluster or application recovery is a critical task, the consideration of such features is essential when selecting a platform.

Table 2 presents a comparative analysis of multicluster management platforms. In conclusion, the chosen multicluster management platform must consider users’ specific requirements, technological capabilities, security needs, and more. Recovery support, backup location, interface convenience, and costs are significant considerations. Selecting and appropriately implementing the most suitable tool based on the system’s complexity and requirements is key to maximizing the operational efficiency. Various elements require careful evaluation to choose the optimal solution that aligns with the user’s goals.

Table 2. Comparative Analysis of Multicluster Management Platform.

Rancher

Kubesphere

Razee

Openshift

Application-level Recovery Support

Does not support Application-level Recovery

Does not support Application-level Recovery

Does not support Application-level Recovery

Supports Application-level Recovery

Cluster-level Recovery Support

Supports Cluster-level Recovery

Does not support Cluster-level Recovery

Does not support Cluster-level Recovery

Supports Cluster-level Recovery

Supported Cloud Platforms

AWS, GCP, Azure, etc.

AWS, GCP, Azure, etc.

AWS, GCP, Azure, etc.

AWS, GCP, Azure, etc.

Backup Technology Support

Supports S3 and local storage

Does not support (Users must perform backups using third-party tools)

Does not support (Users must perform backups using third-party tools)

Supports backup to S3, block stores, and integrated with other storage solutions

RBAC Support

Supports multi-user RBAC

Supports multi-user RBAC

Supports multi-user RBAC

Supports multi-user RBAC

Encryptrion

Supports authentication and authorization, with data encryption

Supports authentication and authorization, with data encryption

Supports authentication and authorization, with data encryption

Supports authentication and authorization, with data encryption

User Interface

Supports both Web UI and CLI

Supports both Web UI and CLI

Supports both Web UI and CLI

Supports both Web UI and CLI

Cost

Open Source, Free, and Commercial versions available

Open Source, Free

Open Source, Free

Depends on the subscription (Enterprise versions are not free)

Automatic Cluster-level Recovery Support

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Does not support Automatic Cluster-level Recovery

Automatic Application -level Recovery Support

Does not support Automatic Application-level Recovery

Does not support Automatic Application-level Recovery

Does not support Automatic Application-level Recovery

Does not support Automatic Application-level Recovery

5. Future Work

We present future research and development directions for multicluster management. There is a growing need for a system that integrates multicluster management platforms with cluster status verification capabilities, as well as backup and restoration tools with recovery features at both the cluster and application levels and adds automated recovery functions. This enables automatic recovery at both the cluster and application levels in the event of a disaster, further enhancing the operational efficiency and system integrity. With a more detailed analysis and the addition of recovery automation features, a more robust and trustworthy system can be constructed with integrated multicluster management and backup and restoration tools. The significance of such research is considerable given that it can evolve beyond technical limitations to meet an organization’s strategic objectives.

6. Conclusion

This study conducts a comprehensive comparative analysis of multicluster management platforms and backup and restoration tools to aid business organizations in adopting proper DR tools in cloud-native environments. Our analysis goes beyond mere functional comparisons, leading to proposals for fruitful future research to bridge the gap between diverse user requirements, including security issues, and existing tools.

ACKNOWLEDGMENTS

This work was supported by an Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-02082, CDM_Cloud: Multicloud Data Protection and Management Platform).

This work was supported by 2023 Hongik University Innovation Support Program Fund.

REFERENCES

1 
G.-Z. Sun, Y. Dong, D.-W. Chen, and J. Wei, “Data Backup and Recovery Based on Data De-Duplication,” in 2010 International Conference on Artificial Intelligence and Computational Intelligence, Sanya, China: IEEE, Oct. 2010, pp. 379-382.DOI
2 
P. Menard, R. Gatlin, and M. Warkentin, “Threat Protection and Convenience: Antecedents of Cloud-Based Data Backup,” Journal of Computer Information Systems, vol. 55, no. 1, pp. 83-91, Sep. 2014,DOI
3 
A. A. Tamimi, R. Dawood, and L. Sadaqa, “Disaster Recovery Techniques in Cloud Computing,” 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 2019, pp. 845-850,DOI
4 
S. Prakash, S. Mody, A. Wahab, S. Swaminathan, and R. Paramount, “Disaster recovery services in the cloud for SMEs,” 2012 International Conference on Cloud Computing Technologies, Applications and Management (ICCCTAM), 2012, pp. 139-144,DOI
5 
S. Suguna and A. Suhasini, “Overview of Data Backup and Disaster Recovery in Cloud,” in International Conference on Information Communication and Embedded Systems (ICICES2014), Chennai, India: IEEE, Feb. 2014, pp. 1-7.DOI
6 
“Kubernetes.” (accessed Aug. 28, 2023).URL
7 
S. De Sameer and R. Prashant Singh, “Selective Analogy of Mechanisms and Tools in Kubernetes Lifecycle for Disaster Recovery,” in 2022 IEEE 2nd International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, Karnataka, India: IEEE, Dec. 2022, pp. 1-6.DOI
8 
“Velero.” (accessed Aug. 29, 2023).DOI
9 
“K10 Overview—K10 6.0.6 documentation.” (accessed Aug. 29, 2023).DOI
10 
A. A. Tamimi, R. Dawood, and L. Sadaqa, “Disaster Recovery Techniques in Cloud Computing,” 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 2019, pp. 845-850,DOI
11 
X. Yu, D. Wang, X. Sun, B. Zheng, and Y. Du, “Design and Implementation of a Software Disaster Recovery Service for Cloud Computing-Based Aerospace Ground Systems,” in 2022 11th International Conference on Communications, Circuits and Systems (ICCCAS), Singapore, Singapore: IEEE, May 2022, pp. 220-225.DOI
12 
S. Challagidad, S. Dalawai, and N. Birje, “Efficient and Reliable Data Recovery Technique in Cloud Computing,” Internet of Things and Cloud Computing, vol. 5, no. 1, pp. 13-18, 2017.DOI
13 
J. Yu and L. Yang, “The Cloud Technology Double Live Data Center Information System Research and Design based on Disaster Recovery Platform,” Procedia Engineering. vol. 174, pp. 1356-1370, 2016.URL
14 
L. Wang, R. E. Harper, R. Mahindru, and H. V. Ramasamy, “Disaster Recovery for Cloud-Hosted Enterprise Applications,” the 9th IEEE International Conference on Cloud Computing (CLOUD), pp. 432-439, 2016.DOI
15 
A. Poniszewska-Marańda and E. Czechowska, “Kubernetes Cluster for Automating Software Production Environment,” Sensors, vol. 21, no. 5, pp. 1910, Mar. 2021,DOI
16 
“Persistent Volumes,” Kubernetes. (accessed Dec. 04, 2022).URL
17 
“Portworx Backup Documentation,” (accessed Aug. 29, 2023).URL
18 
“2. Overview—KubeDR documentation.” (accessed Aug. 29, 2023).URL
19 
“Rancher Brand Guidelines & Resources,” Rancher Labs. (accessed Dec. 04, 2022).URL
20 
“Documentation.” (accessed Aug. 29, 2023).URL
21 
“Razee,” (accessed Aug. 29, 2023).URL
22 
“Red Hat OpenShift Enterprise Kubernetes Container Platform.” (accessed Aug. 29, 2023).URL
Jibeom Kim
../../Resources/ieie/IEIESPC.2024.13.5.499/au1.png

Jibeom Kim received a Bachelor’s degree in Computer Science and Information Communication Engi-neering from Hongik University in Sejong, South Korea, in 2022. He enrolled in a Master’s program in 2022. He researched distributed systems and Kubernetes disaster recovery, and actively participated in a research project (CDM_Cloud: Multicloud Data Protection and Management Platform) supported by the IITP (Institute for Information & Communication Technology Planning & Evaluation).

Eun-Sung Jung
../../Resources/ieie/IEIESPC.2024.13.5.499/au2.png

Eun-Sung Jung received a Bachelor’s degree in Electrical Engineering from Seoul National University in 1996 and a Master’s degree in Electrical Engineering in 1998. In 2010, he obtained a Ph.D. in Computer Engineering from the University of Florida. From 1998 to 2000, he worked as a Research Engineer at LG Industrial Systems, and from 2000 to 2005, he was a Team Leader at MacroImpact. From 2011 to 2012, he held the position of Principal Engineer at Samsung Advanced Institute of Technology. Furthermore, he worked as a PostDoc at Argonne National Lab from 2013 to 2016, and during the summers of 2016, 2017, 2018, and 2019, he actively participated as a Visiting Faculty in the Faculty Research Program at Argonne National Lab. Since 2016, he has been employed as an Associate Professor at Hongik University, Sejong, South Korea. Prof. Jung is a member of IEICE and a Senior Member of IEEE. In 2011, he served as a journal reviewer for IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Computers, and many other journals.