Use Case 6: Database Migration into a Secure Cloud

Partner: SAP
This use case describes the migration of a company’s legacy data into a secure cloud environment. This can be assumed on the background of a mid-sized company who wants to move from an on-premise ERP solution to a cloud solution. However, none of the sensitive information within their data should be accessible in clear outside of the data owners’ company. Hence, encryption of the data is required.
Encrypting legacy data, which can easily contain multiple gigabytes of data, could take several months. Hence, a migration process needs significant computational resources, e.g. an on-premise cluster environment, to speed up the process. Moreover, the encrypted data should be optimized for storage space, adapted to the data owner’s sensitivity requirements and achieve optimal performance in later queries executed on top of it. Furthermore, the data should be stored within the cloud provider’s database in such a way that multi-tenancy can be realized for the cloud provider’s cost benefit while assuring that no other tenant has access to the data owners’ data.

Business Context
Small, midsized and large enterprises more and more tend to use functionality provided by cloud services. This may include ERP, CRM, HR and other business applications providing solutions to process day-to-day business scenarios as well as simple data storage solutions with hosted databases. Using such hosted functionality requires outsourcing the company’s data – including highly sensitive data – to a cloud provider. Storing sensitive data outside of a company’s own premises exposes the data to the risk of being misused. For instance, an honest but curious database administrator working with the cloud provider can easily access any stored data. Obviously, it is in the company’s best interest to store its data in a secure way (i.e., encrypted). This usually comes with the restriction that no application can process the data if it is not decrypted beforehand. There are, however, solutions such as CryptDB which enable to store encrypted data while maintaining the possibility of executing SQL statements directly on the encrypted data. Utilizing such a solution requires that all legacy data– which can easily go into terabytes – undergo encryption before being transferred and stored at the cloud provider. This can take weeks to months, which leads to a potential downtime. Alternatively, the encryption of a fixed database state can be done in parallel to the day to day business, but this requires a complex update of the encrypted data afterwards. Hence, an optimized solution for a secure migration of large data sets of business data into a secure cloud is required.

Technology Context
The encryption service should be provided as Software as a Service (SaaS, e.g., at a private cloud). A company should then use the encryption service to encrypt its legacy data for later storing it with the cloud provider. The encryption should be based on a framework similar to CryptDB which allows encrypting data in a database in such a way, that the execution of SQL statements is still possible over the encrypted data.

A specific JDBC driver may be used for connecting to a database containing encrypted data (see Figure). The goal of the driver is to realize accessing encrypted data transparently from the client application, which means a large set of regular SQL queries can be used to search over encrypted data. The encryption service is required for the initial setup of the sensitive data.

Expected Outcomes and Contribution of TREDISEC
On-premise applications migrated into the cloud (e.g., moving an ERP solution into the cloud with a cloud provider) may contain sensitive data. Data encryption, however, limits the functionalities of those applications such that it is required to look for processing techniques for encrypted data that is suitable for database queries. Therefore, the outcome of TREDISEC is a concept for a service that facilitates the optimized encryption of large data sets using different encryption schemes while at the same time maintaining query processing over this encrypted data.