Decoupling Consistency and Trust from Underlying Distributed Data Stores
Building applications on cloud services is cost-effective and allows for rapid development and release cycles. However, relying on cloud services can severely limit applications' ability to define their own policies, get the consistency they want, and retain control of what data is visible during replication. To understand the tension between strong consistency and security guarantees on one hand and high availability, flexible replication, and performance on the other, it helps to consider two questions. First, is it possible for an application to achieve stricter consistency guarantees than what the cloud provider offers? If we solely rely on the provider service interface, the answer is no. However, if we allow the applications to determine the implementation and the execution of the consistency protocols, then we can achieve much more. The second question is, can an application relay updates over untrusted replicas without revealing sensitive information while maintaining the desired consistency guarantees? Simply encrypting the data is not enough. Encryption does not eliminate information leakage that comes from the meta-data needed for the execution of any consistency protocol. The alternative to encryption---allowing the flow of updates only through trusted replicas---leads to predefined communication patterns. This approach is prone to failures that can cause partitioning in the system. One way to answer ``yes'' to this question is to allow trust relationships, defined atthe application level, to guide the synchronization protocol. My goal in this thesis is to build systems that take advantage of the performance, scalability, and availability of the cloud storage services while, at the same time, bypassing the limitations imposed by cloud service providers' design choices. The key to achieving this is pushingapplication-specific decisions where they belong: the application. I defend the following thesis statement: By decoupling consistency determination and trust from the underlying distributed data store, it is possible to (1) support application-specific consistency guarantees; (2) allow for topology independent replication protocols that do not compromise application privacy. First, I design and implement Shell, a system architecture for supporting strict consistency guarantees over eventually consistent data stores. Shell is a software layer designed to isolate consistency implementations and cloud-provider APIs from the application code. Shell consists of four internal modules and an application-store, which together abstract consistency-related operations and encapsulate communication with the underlying storage layers. Apart from consistency protocols tailored to application needs, Shell provides application-aware conflict resolution without relying on generic heuristics such as the ``last write wins.'' Shell does not require the application to maintain dependency-tracking for the execution of the consistency protocols as other existing approaches do. I experimentally evaluate Shell over two different data-stores using real-application traces. I found that using Shell can reduce the inconsistent updates by 10%. I also measure and show the overheads that come from introducing the Shell layer. Second, I design and implement T.Rex, a system for supporting topology-independent replication without the assumption of trust between all the participating replicas. T.Rex uses role-based access control to enable flexible and secure sharing among users with widely varying collaboration types: both users and data items are assigned roles, and a user can access data only if it shares at least one role. Building on top of this abstraction, T.Rex includes several novel mechanisms: I introduce role proofs to prove role membership to others in the role without leaking information to those not in the role. Additionally, I introduce role coherence to prevent updates from leaking across roles. Finally, I use Bloom filters as opaque digests to enable querying of remote cache state without being able to enumerate it. I combine these mechanisms to develop a novel, cryptographically secure, and efficient anti-entropy protocol, T.Rex-Sync. I evaluate T.Rex on a local test-bed, and I show that it achieves security with modest computational and storage overheads.
Chair: Dr. Pete Keleher Dean's rep: Dr. Mark Shayman Members: Dr. Neil Spring Dr. Udaya Shankar Dr. Yiannis Aloimonos