28/ Feb/ 2018
Brigitta Lange & Ghassan Karame

ClearBox allows a storage service provider to transparently attest to its customers the deduplication patterns of the (encrypted) data that it is storing. By doing so, ClearBox enables cloud users to verify the effective storage space that their data is occupying in the cloud, and consequently to check whether they qualify for benefits such as price reductions, etc

The literature features a large number of proposals for securing data deduplication in the cloud. All these proposals share the goal of enabling cloud providers to deduplicate encrypted data stored by their users. Such solutions allow the cloud provider to reduce its total storage, while ensuring the confidentiality of stored data. By doing so, existing solutions increase the profitability of the cloud, but do not allow users to directly benefit from the savings of deduplication over their data.

ClearBox relies on gateway to orchestrate cross-user file-based deduplication prior to storing files on (public) cloud servers. ClearBox ensures that files can only be accessed by legitimate owners, resists against a curious cloud provider, and enables cloud users to verify the effective storage space occupied by their encrypted files in the cloud (after deduplication). By doing so, ClearBox provides its users with full transparency on the storage savings exhibited by their data; this allows users to assess whether they are acquiring appropriate service and price reductions for their money—in spite of a rational gateway that aims at maximizing its profit.

ClearBox ensures a transparent attestation of the storage consumption of users whose data is effectively being deduplicated — without compromising the confidentiality of the stored data.

ClearBox employs a novel Merkle-tree based cryptographic accumulator which is maintained by the gateway to efficiently accumulate the IDs of the users registered to the same file within the same time epoch. Our construct ensures that each user can check that his ID is correctly accumulated at the end of every epoch. Additionally, our accumulator encodes an upper bound on the number of accumulated values, thus enabling any legitimate client associated to the accumulator to verify (in logarithmic time with respect to the number of clients that uploaded the same file) this bound.

ClearBox is the first complete system which enables users to verify the storage savings exhibited by their data. We argue that ClearBox motivates a novel cloud pricing model which promises a fairer allocation of storage costs amongst users—without compromising data confidentiality nor system performance. We believe that such a model provides strong incentives for users to store popular data in the cloud (since popular data will be cheaper to store) and discourages the upload of personal and unique content. As a by-product, the popularity of files additionally gives an indication to cloud users on their level of privacy in the cloud; for example, the user can verify that his private files are not deduplicated—and thus have not been leaked.