In distributed storage systems that store data, a common requirement is that the data is available to the users, even when the storage nodes fail. In applications to distributed computing where the storage system is used as a shared memory, it is also important that the data should appear to the users as if stored on a single centralized system even though the data is stored in distributed servers, i.e., the system should support atomic behavior. In this talk, we consider a distributed storage system where the data is stored in server nodes, and supports concurrent access by readers and writers. The nodes of the storage system can communicate via a reliable asynchronous message passing system. For this system, we will present an algorithm that emulates an atomic shared memory as long as the number of server node failures is bounded. Our approach differs from several traditional replication based shared memory algorithm in that we use techniques from erasure coding to store the data. We formally analyze communication and storage costs of our algorithm, and compare them with the replication based shared memory algorithms of Attiya, Bar-Noy, and Dolev (awarded the Dijkstra Prize in 2011), and Fan and Lynch. We will show that our technique incurs lower communication costs than the traditional algorithms. Time permitting, we will also show that a simple modification of our algorithm also incurs lower storage costs under certain mild conditions. We will conclude by describing some interesting open research topics.
The content of this talk is designed to be accessible to an audience with elementary knowledge of computing systems. No prior knowledge of distributed computing theory or erasure coding theory will be assumed.