D4.3 - A Proposal for Data Confidentiality and Deduplication

Cloud storage services have become an integral part of our daily lives. With more and more people operating multiple devices, cloud storage promises a convenient means for users to store, access, and seamlessly synchronize their data from multiple devices. With the ever increasing amount of data produced worldwide, the cloud offers a cheaper and more reliable alternative to local storage. Existing cloud service providers such as Amazon S3, Microsoft Azure, or Dropbox guarantee a good trade-off between quality of service and cost effectiveness. Most existing cloud storage providers rely on data deduplication in order to significantly save storage costs by storing duplicate data only once — thus saving storage costs. The cloud has also gained many clients among SMEs and large businesses that are mainly interested in storing large amount of data while minimizing the costs of both storage and infrastructure management/maintenance.

While benefits of cloud storage are clear, there are many issues that have not been fully solved. The first problem is to ensure data confidentiality when it is outsourced on the cloud. Even though cloud services relied on encryption mechanisms to guarantee data confidentiality, the necessary keying material was acquired by means of backdoors, bribe, or coercion lead to data compromise. Existing solutions are not performance efficient and cause overhead, especially to large files. The second problem is about securing data deduplication (over encrypted data). The third problem is about information leakage associated with data deduplication on a storage server. Even if the underlying client-side encryption is secure we can show that the storage provider can still acquire considerable information about the stored files without knowledge of the encryption key.

This deliverable presents our novel solutions to address the above problems. Summaries of contributions are follows. More details can be found in sections included in the deliverable and in our publications.

To provide confidentiality of data stored in the cloud, we study data confidentiality against an adversary which knows the encryption key and has access to a large fraction of the ciphertext blocks. To this end, we propose Bastion, a novel and efficient scheme that guarantees data confidentiality even if the encryption key is leaked and the adversary has access to almost all ciphertext blocks. We analyse the security of Bastion, and we evaluate its performance by means of a prototype implementation. We also discuss practical insights with respect to the integration of Bastion in commercial dispersed storage systems. Our evaluation results suggest that Bastion is well-suited for integration in existing systems since it incurs less than 5% overhead compared to existing semantically secure encryption modes.

Regarding transparent data deduplication in the cloud, we propose two novel solutions: ClearBox and PerfectDedup. ClearBox enables cloud users to verify the effective storage space that their data is occupying in the cloud, and consequently to check whether they qualify for benefits such as price reductions ClearBox is secure against malicious users and a rational storage provider, and ensures that files can only be accessed by their legitimate owners. We evaluate a prototype implementation of ClearBox using both Amazon S3 and Dropbox as back-end cloud storage. Our findings show that our solution works with the APIs provided by existing service providers without any modifications and achieves comparable performance to existing solutions. On the other hand, PerfectDedup enables the cloud to securely detect and deduplicate redundant data blocks while these are encrypted. PerfectDedup implements different encryption techniques based on the popularity of the data. Popular data are assumed to be less sensitive and shared among a large number of users and are therefore protected under convergent encryption only, whereas unpopular data segments which are likely to remain personal and unique are encrypted with semantically-secure symmetric encryption. We have implemented a prototype of this new mechanism and evaluated its performance. We show that compared to existing solutions, PerfectDedup incurs less storage and communication overhead. Additionally, we also devise a new key generation protocol that enables cloud users to encrypt redundant data with the same encryption key. This new message-locked key generation protocol provides better security guarantees compared to existing protocols.

With respect to information leakage in deduplicated storage systems, we address this problem and analyse information leakage associated with data deduplication with respect to a curious storageserver. We show that even if the data is encrypted using a key not known by the storage server, the latter can still acquire considerable information about the stored files and even determine which files are stored. We validate our results both analytically and experimentally using a number of real storage datasets.