- compression of encrypted data
- secure data deduplication
- proof of ownership with data confidentiality
Convergent encryption is a cryptographic primitive introduced by Douceur et al. (Douceur, et al., 2002), attempting to combine data confidentiality with the possibility of data deduplication.
Convergent encryption of a message consists of encrypting the plaintext using a deterministic (symmetric) encryption scheme with a key which is deterministically derived solely from the plaintext. Clearly, when two users independently attempt to encrypt the same file, they will generate the same ciphertext which can be easily deduplicated. Unfortunately, convergent encryption does not provide semantic security as it is vulnerable to content-guessing attacks.
In TREDISEC, we aim at designing solutions for privacy preserving data deduplication that do not rely on fully trusted entities; they will rather leverage novel and innovative mechanisms to ensure that only the data owner can disclose the content of its data.
- distributed (ABAC) policy enforcement with multi-tenancy
- efficient shared ownership
- secure data deletion
Cloud systems are composed of several, often complex software modules: in the presence of vulnerabilities or colluding privileged users, a malicious entity can subvert the correct execution of the system and compromise confidentiality and integrity.
Perhaps counter-intuitively, when it comes to a storage system, access control rules must include the support for secure data deletion; that is, the rightful owner must be able to instruct the system to destroy any copy of their data, regardless of caching, snapshots, replicated or erasure-coded copies. Traditional solutions (e.g. digital shredding with overwrite patterns) are either widely impractical when we meet the scale of today's cloud storage systems, or are not fine-grained enough, or fail on specific media (e.g. log-structured systems used in modern SSDs). Cryptographic solutions to this problem have been found (Cachin, et al., 2013), but as we shall see later, they are ineffective when combined with storage efficiency functions, and deduplication in particular.
We postulate that existing cloud storage platforms are still too weak when it comes to isolating tenants and containing attacks, and argue that the threat of unknown vulnerabilities and the subsequent loss of data governance is still one of the main reasons why businesses are still afraid of the cloud. Yet, without resource sharing, the cloud model cannot be successfully implemented.
- privacy preserving word search with data reduction
- privacy preserving word search with multi-tenancy
Confidentiality of data requires that when users outsource data, the cloud should not learn any information about the data it is storing and the operations performed over it.
Although classical encryption algorithms ensure data confidentiality, they unfortunately prevent the cloud from operating over encrypted data. The obvious approach could be to encrypt all data with a secure encryption algorithm such as AES and store it in the cloud. However, while secure, all data can no longer be processed in the cloud but has to be downloaded and decrypted on the client to execute any query on it. This makes any serious Database as a Service offering questionable and is the way many traditional DBMS like Sybase, Oracle, DB2 or solutions like Dropbox appear to work when they claim to encrypt data and provide cloud storage.
Moreover, both, the queries issued by the user and the result of the queries should remain confidential to the cloud. Existing crypto primitives such as searchable encryption or private information retrieval cannot immediately be adopted by current cloud solutions.
- proofs of retrievability with deduplication
- verifiable computation
- system integrity verification
Whereas POW deals with the assurance that a client indeed possesses a given file, Provable Data Possession (PDP) and Proof of Retrievability (PoR) deal with the dual problem of ensuring - at the client-side - that a server still stores the files it ought to. PoR and PDP schemes address the requirement of data integrity (ensuring that data has not undergone malicious modifications) and availability (ensuring that data is still available in its entirety and can be downloaded if needed).
Trusted Execution Environments offer a way of securing PoR and PDP protocols. In particular, trusted computing based systems can be used to generate proofs supporting properties on the lower layers of the software stack and the function set of the Trusted Platform Module (TPM). While feasible in theory, such approaches still suffer from the limitations highlighted in the previous sections.
Encryption keys stored in the hard disk are susceptible to tampering, TPM solutions offer a protected storage of keys through hardware and protection of authentication credentials by binding them to the platform, providing a stronger mechanism to prevent unauthorized access to the platform and thus, the integrity of the data stored. Authentication built on top of trusted computing services (based on the use of TPMs) provides higher degrees of assurance, but performance overheads introduced can be significant.