Global QuickSearch:   Number of matching entries: 0

Search Settings

    BibTeX-Key Author / Editor / Organization Title Year Journal / Proceedings / Book BibTeX type Keywords
    Abouzeid2009 Abouzeid, A.; Pawlikowski, K.B.; Abadi, D.J.; Rasin, A. & Silberschatz, A. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads 2009 PVLDB
    Vol. 2 (1) , pp. 922-933  
    article databases, hadoop, map-reduce
    Abstract: The production environment for analytical data management applications is rapidly changing. Many enterprises are shifting away from deploying their analytical databases on high-end proprietary machines, and moving towards cheaper, lower-end, commodity hardware, typically arranged in a shared-nothing MPP architecture, often in a virtualized environment inside public or private “clouds”. At the same time, the amount of data that needs to be analyzed is exploding, requiring hundreds to thousands of machines to work in parallel to perform the analysis.

    There tend to be two schools of thought regarding what technology to use for data analysis in such an environment. Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them wellsuited to perform such analysis. On the other hand, others argue that MapReduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data. In this paper, we explore the feasibility of building a hybrid system that takes the best features from both technologies; the prototype we built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.

    BibTeX:
    @article{Abouzeid2009,
      author = {Abouzeid, Azza and Pawlikowski, Kamil B. and Abadi, Daniel J. and Rasin, Alexander and Silberschatz, Avi},
      title = {HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads},
      journal = {PVLDB},
      year = {2009},
      volume = {2},
      number = {1},
      pages = {922--933},
      url = {http://db.cs.yale.edu/hadoopdb/hadoopdb.html}
    }
    					
    Abraham2004 Abraham, I.; Chockler, G.V.; Keidar, I. & Malkhi, D. Byzantine disk paxos: optimal resilience with byzantine shared memory 2004 PODC '04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing , pp. 226-235   inproceedings
    BibTeX:
    @inproceedings{Abraham2004,
      author = {Abraham, Ittai and Chockler, Gregory V. and Keidar, Idit and Malkhi, Dahlia},
      title = {Byzantine disk paxos: optimal resilience with byzantine shared memory},
      booktitle = {PODC '04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing},
      publisher = {ACM},
      year = {2004},
      pages = {226--235},
      doi = {http://doi.acm.org/10.1145/1011767.1011801}
    }
    					
    Aguilera2008 Aguilera, M.K.; Golab, W. & Shah, M.A. A practical scalable distributed B-tree 2008 Proc. VLDB Endow.
    Vol. 1 (1) , pp. 598-609  
    article b-tree, database, distributed
    Abstract: Internet applications increasingly rely on scalable data structures that must support high throughput and store huge amounts of data. These data structures can be hard to implement efficiently. Recent proposals have overcome this problem by giving up on generality and implementing specialized interfaces and functionality (e.g., Dynamo [4]). We present the design of a more general and flexible solution: a fault-tolerant and scalable distributed B-tree. In addition to the usual B-tree operations, our B-tree provides some important practical features: transactions for atomically executing several operations in one or more B-trees, online migration of B-tree nodes between servers for load-balancing, and dynamic addition and removal of servers for supporting incremental growth of the system.
    BibTeX:
    @article{Aguilera2008,
      author = {Aguilera, Marcos K. and Golab, Wojciech and Shah, Mehul A.},
      title = {A practical scalable distributed B-tree},
      journal = {Proc. VLDB Endow.},
      publisher = {VLDB Endowment},
      year = {2008},
      volume = {1},
      number = {1},
      pages = {598--609},
      url = {http://www.hpl.hp.com/techreports/2007/HPL-2007-193.html},
      doi = {http://doi.acm.org/10.1145/1453856.1453922}
    }
    					
    Aguilera2007 Aguilera, M.K.; Merchant, A.; Shah, M.; Veitch, A. & Karamanolis, C. Sinfonia: a new paradigm for building scalable distributed systems 2007 SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles , pp. 159-174   inproceedings 2007
    BibTeX:
    @inproceedings{Aguilera2007,
      author = {Aguilera, Marcos K. and Merchant, Arif and Shah, Mehul and Veitch, Alistair and Karamanolis, Christos},
      title = {Sinfonia: a new paradigm for building scalable distributed systems},
      booktitle = {SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles},
      publisher = {ACM Press},
      year = {2007},
      pages = {159--174},
      url = {http://www.ssrc.ucsc.edu/PaperArchive/aguilera-sosp07.pdf},
      doi = {http://dx.doi.org/10.1145/1294261.1294278}
    }
    					
    Armstrong2007 Armstrong, J. A History of Erlang 2007 HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages , pp. 6-1-6-26   inproceedings
    Review: Read
    BibTeX:
    @inproceedings{Armstrong2007,
      author = {Joe Armstrong},
      title = {A History of Erlang},
      booktitle = {HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages},
      publisher = {ACM},
      year = {2007},
      pages = {6-1--6-26},
      url = {http://www.cs.chalmers.se/Cs/Grundutb/Kurser/ppxt/HT2007/general/languages/armstrong-erlang_history.pdf},
      doi = {http://doi.acm.org/10.1145/1238844.1238850}
    }
    					
    Armstrong2007a Armstrong, J. Programming Erlang: Software for a Concurrent World 2007 Paperback  book erlang, functional-programming
    Abstract: Erlang solves one of the most pressing problems facing developers today: how to write reliable, concurrent, high-performance systems. It's used worldwide by companies who need to produce reliable, efficient, and scalable applications. Invest in learning Erlang now.

    Moore's Law is the observation that the amount you can do on a single chip doubles every two years. But Moore's Law is taking a detour. Rather than producing faster and faster processors, companies such as Intel and AMD are producing multi-core devices: single chips containing two, four, or more processors. If your programs aren't concurrent, they'll only run on a single processor at a time. Your users will think that your code is slow.

    Erlang is a programming language designed for building highly parallel, distributed, fault-tolerant systems. It has been used commercially for many years to build massive fault-tolerated systems that run for years with minimal failures.

    Erlang programs run seamlessly on multi-core computers: this means your Erlang program should run a lot faster on a 4 core processor than on a single core processor, all without you having to change a line of code.

    Erlang combines ideas from the world of functional programming with techniques for building fault-tolerant systems to make a powerful language for building the massively parallel, networked applications of the future.

    This book presents Erlang and functional programming in the familiar Pragmatic style. And it's written by Joe Armstrong, one of the creators of Erlang.

    It includes example code you'll be able to build upon. In addition, the book contains the full source code for two interesting applications:

    - A SHOUTcast server which you can use to stream music to every computer in your house, and

    - a full-text indexing and search engine that can index gigabytes of data.

    Learn how to write programs that run on dozens or even hundreds of local and remote processors. See how to write robust applications that run even in the face of network and hardware failure, using the Erlang programming language.

    Review: Read
    BibTeX:
    @book{Armstrong2007a,
      author = {Armstrong, Joe},
      title = {Programming Erlang: Software for a Concurrent World},
      publisher = {Pragmatic Bookshelf},
      year = {2007},
      url = {http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/193435600X}
    }
    					
    Armstrong2003 Armstrong, J. Making Reliable Distributed Systems in the Presence of Software Errors 2003 School: The Royal Institute of Technology, Stockholm, Sweden   phdthesis
    Abstract: The work described in this thesis is the result of a research program started in 1981 to find better ways of programming Telecom applications. These applications are large programs which despite careful testing will probably contain many errors when the program is put into service. We assume that such programs do contain errors, and investigate methods for building reliable systems despite such errors. The research has resulted in the development of a new programming language (called Erlang), together with a design methodology, and set of libraries for building robust systems (called OTP). At the time of writing the technology described here is used in a number of major Ericsson, and Nortel products. A number of small companies have also been formed which exploit the technology. The central problem addressed by this thesis is the problem of constructing reliable systems from programs which may themselves contain errors. Constructing such systems imposes a number of requirements on any programming language that is to be used for the construction. I discuss these language requirements, and show how they are satisfied by Erlang. Problems can be solved in a programming language, or in the standard libraries which accompany the language. I argue how certain of the requirements necessary to build a fault-tolerant system are solved in the language, and others are solved in the standard libraries. Together these form a basis for building fault-tolerant sodware systems. No theory is complete without proof that the ideas work in practice. To demonstrate that these ideas work in practice I present a number of case studies of large commercially successful products which use this technology. At the time of writing the largest of these projects is a major Ericsson product, having over a million lines of Erlang code. This product (the AXD301) is thought to be one of the most reliable products ever made by Ericsson. Finally, I ask if the goal of finding better ways to program Telecom applications was fulfilled--I also point to areas where I think the system could be improved.
    Review: Read
    BibTeX:
    @phdthesis{Armstrong2003,
      author = {Joe Armstrong},
      title = {Making Reliable Distributed Systems in the Presence of Software Errors},
      school = {The Royal Institute of Technology, Stockholm, Sweden},
      year = {2003},
      url = {http://www.sics.se/~joe/thesis/armstrong_thesis_2003.pdf}
    }
    					
    Armstrong1996 Armstrong, J.; Virding, R.; Wikström, C. & Williams, M. Concurrent Programming in ERLANG 1996   book concurrency, embedded-systems
    Review: Half read
    BibTeX:
    @book{Armstrong1996,
      author = {Armstrong, Joe and Virding, Robert and Wikström, Claes and Williams, Mike},
      title = {Concurrent Programming in ERLANG},
      publisher = {Prentice Hall},
      year = {1996},
      url = {http://erlang.org/download/erlang-book-part1.pdf}
    }
    					
    Arpaci-Dusseau1997 Arpaci-Dusseau, A.C.; Arpaci-Dusseau, R.H.; Culler, D.E.; Hellerstein, J.M. & Patterson, D.P. High-Performance Sorting on Networks of Workstations 1997 Proceedings of the 1997 ACM SIGMOD Conference , pp. 243-254   inproceedings
    Abstract: We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on NOWs is competitive to sorting on the large-scale SMPs that have traditionally held the performance records. On a 32-node cluster, we finish the Datamation benchmark in 2.41 seconds, and can sort 6.0 GB in just under one minute. On a smaller, better equipped, 8-node cluster, we run the Datamation in 2.92 seconds, and sort 1.4 GB in a minute. Our implementations can be applied to a variety of disk, memory, and processor configurations; we highlight salient issues for tuning each component of the system. Throughout the paper, we evaluate the use of commodity hardware and operating systems for parallel sorting, and note lessons that can be drawn when applying NOW technology to data-intensive applications.
    BibTeX:
    @inproceedings{Arpaci-Dusseau1997,
      author = {Arpaci-Dusseau, Andrea C. and Arpaci-Dusseau, Remzi H. and Culler, David E. and Hellerstein, Joseph M. and Patterson, David P.},
      title = {High-Performance Sorting on Networks of Workstations},
      booktitle = {Proceedings of the 1997 ACM SIGMOD Conference},
      year = {1997},
      pages = {243--254},
      url = {http://now.cs.berkeley.edu/NowSort/}
    }
    					
    Belsnes1976 Belsnes, D. Single-Message Communication 1976 IEEE Transaction on Communications
    Vol. COM-24 , pp. 190-194  
    article
    Abstract: When a communication system is used to transmit many short messages, it is important to reduce the amount of control overhead for creation and destruction of logic process-to-process connections and for reliable communication. Different end-to-end control procedures are described, and they are studied with respect to the possibility of losing a message or accepting a duplicate. It is shown (under certain assumptions about the communication network) that all end-to-end protocols either allow for loss of a message or can deliver duplicates of a message.
    BibTeX:
    @article{Belsnes1976,
      author = {Dag Belsnes},
      title = {Single-Message Communication},
      journal = {IEEE Transaction on Communications},
      year = {1976},
      volume = {COM-24},
      pages = {190--194},
      url = {http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1093283}
    }
    					
    Benjamin2010 Benjamin H. Sigelman, Luiz André Barroso, M.B.P.S.M.P.D.B.S.J.C.S. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure 2010   techreport
    Abstract: Modern Internet services are often implemented as complex, large-scale distributed systems. These applications are constructed from collections of software modules that may be developed by different teams, perhaps in different programming languages, and could span many thousands of machines across multiple physical facili- ties. Tools that aid in understanding system behavior and reasoning about performance issues are invaluable in such an environment.

    Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met. Dapper shares conceptual similarities with other tracing systems, particularly Magpie [3] and X-Trace [12], but certain design choices were made that have been key to its success in our environment, such as the use of sampling and restricting the instrumentation to a rather small number of common libraries.

    The main goal of this paper is to report on our experience building, deploying and using the system for over two years, since Dapper’s foremost measure of success has been its usefulness to developer and operations teams. Dapper began as a self-contained tracing tool but evolved into a monitoring platform which has enabled the creation of many different tools, some of which were not anticipated by its designers. We describe a few of the analysis tools that have been built using Dapper, share statistics about its usage within Google, present some example use cases, and discuss lessons learned so far.

    BibTeX:
    @techreport{Benjamin2010,
      author = {Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, Chandan Shanbhag},
      title = {Dapper, a Large-Scale Distributed Systems Tracing Infrastructure},
      year = {2010},
      url = {http://research.google.com/archive/papers/dapper-2010-1.pdf}
    }
    					
    Bump Bump, D. Mathematics of the Rubik's Cube   booklet
    Review: Half read
    BibTeX:
    @booklet{Bump,
      author = {Daniel Bump},
      title = {Mathematics of the Rubik's Cube},
      url = {http://match.stanford.edu/bump/rubik.html}
    }
    					
    Burrows2006 Burrows, M. The Chubby lock service for loosely-coupled distributed systems 2006 OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation , pp. 335-350   inproceedings 2006, chubby, google, lock
    Abstract: We describe our experiences with the Chubby lock service, which is intended to provide coarse-grained locking as well as reliable (though low-volume) storage for a loosely-coupled distributed system. Chubby provides an interface much like a distributed file system with advisory locks, but the design emphasis is on availability and reliability, as opposed to high performance. Many instances of the service have been used for over a year, with several of them each handling a few tens of thousands of clients concurrently. The paper describes the initial design and expected use, compares it with actual use, and explains how the design had to be modified to accommodate the differences.
    Review: Read
    BibTeX:
    @inproceedings{Burrows2006,
      author = {Burrows, Mike},
      title = {The Chubby lock service for loosely-coupled distributed systems},
      booktitle = {OSDI '06: Proceedings of the 7th symposium on Operating systems design and implementation},
      publisher = {USENIX Association},
      year = {2006},
      pages = {335--350},
      url = {http://labs.google.com/papers/chubby-osdi06.pdf}
    }
    					
    Castro1999 Castro & Liskov Practical Byzantine Fault Tolerance 1999 OSDI: Symposium on Operating Systems Design and Implementation   inproceedings fs
    Abstract: This paper describes a new replication algorithm that is able to tolerate Byzantine faults. We believe that Byzantinefault-tolerant algorithms will be increasingly important in the future because malicious attacks and software errors are increasingly common and can cause faulty nodes to exhibit arbitrary behavior. Whereas previous algorithms assumed a synchronous system or were too slow to be used in practice, the algorithm described in this paper is practical: it works in asynchronous environments like the Internet and incorporates several important optimizations that improve the response time of previous algorithms by more than an order of magnitude. We implemented a Byzantine-fault-tolerant NFS service using our algorithm and measured its performance. The results show that our service is only 3% slower than a standard unreplicated NFS.
    BibTeX:
    @inproceedings{Castro1999,
      author = {Castro and Liskov},
      title = {Practical Byzantine Fault Tolerance},
      booktitle = {OSDI: Symposium on Operating Systems Design and Implementation},
      publisher = {USENIX Association, Co-sponsored by IEEE TCOS and ACM SIGOPS},
      year = {1999},
      url = {http://research.microsoft.com/en-us/um/people/mcastro/publications/osdi99.pdf}
    }
    					
    Castro2002 Castro, M. & Liskov, B. Practical Byzantine Fault Tolerance and Proactive Recovery 2002 ACM Transactions on Computer Systems (TOCS)
    Vol. 20 (4) , pp. 398-461  
    article
    Abstract: Our growing reliance on online services accessible on the Internet demands highly available systems that provide correct service without interruptions. Software bugs, operator mistakes, and malicious attacks are a major cause of service interruptions and they can cause arbitrary behavior, that is, Byzantine faults. This article describes a new replication algorithm, BFT, that can be used to build highly available systems that tolerate Byzantine faults. BFT can be used in practice to implement real services: it performs well, it is safe in asynchronous environments such as the Internet, it incorporates mechanisms to defend against Byzantine-faulty clients, and it recovers replicas proactively. The recovery mechanism allows the algorithm to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a small window of vulnerability. BFT has been implemented as a generic program library with a simple interface. We used the library to implement the first Byzantine-fault-tolerant NFS file system, BFS. The BFT library and BFS perform well because the library incorporates several important optimizations, the most important of which is the use of symmetric cryptography to authenticate messages. The performance results show that BFS performs 2%; faster to 24%; slower than production implementations of the NFS protocol that are not replicated. This supports our claim that the BFT library can be used to build practical systems that tolerate Byzantine faults.
    BibTeX:
    @article{Castro2002,
      author = {Miguel Castro and Barbara Liskov},
      title = {Practical Byzantine Fault Tolerance and Proactive Recovery},
      journal = {ACM Transactions on Computer Systems (TOCS)},
      year = {2002},
      volume = {20},
      number = {4},
      pages = {398--461},
      url = {http://research.microsoft.com/en-us/um/people/mcastro/publications/p398-castro-bft-tocs.pdf}
    }
    					
    Cesarini2009 Cesarini, F. & Thompson, S. Erlang Programming 2009 , pp. 494   book erlang programming
    Abstract: Erlang was designed for writing concurrent programs that "run forever." Erlang uses concurrent processes to structure the program. These processes have no shared memory and communicate by asynchronous message passing. Erlang processes are lightweight and belong to the language, not the operating system. Erlang has mechanisms to allow programs to change code "on the fly" so that programs can evolve and change as they run. These mechanisms simplify the construction of software for implementing non-stop systems.

    This paper describes the history of Erlang. Material for the paper comes from a number of different sources. These include personal recollections, discussions with colleagues, old newspaper articles and scanned copies of Erlang manuals, photos and computer listings and articles posted to Usenet mailing lists.

    BibTeX:
    @book{Cesarini2009,
      author = {Francesco Cesarini and Simon Thompson},
      title = {Erlang Programming},
      publisher = {O'Reilly},
      year = {2009},
      pages = {494},
      url = {http://www.cs.kent.ac.uk/pubs/2009/2925}
    }
    					
    Chandra2007 Chandra, T.D.; Griesemer, R. & Redstone, J. Paxos made live: an engineering perspective 2007 PODC '07: Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing , pp. 398-407   inproceedings fault-tolerance, paxos
    Abstract: We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system.
    Review: Read
    BibTeX:
    @inproceedings{Chandra2007,
      author = {Chandra, Tushar D. and Griesemer, Robert and Redstone, Joshua},
      title = {Paxos made live: an engineering perspective},
      booktitle = {PODC '07: Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing},
      publisher = {ACM Press},
      year = {2007},
      pages = {398--407},
      url = {http://www.chandrakin.com/paper2.pdf},
      doi = {http://doi.acm.org/10.1145/1281100.1281103}
    }
    					
    Chang2006 Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A. & Gruber, R.E. Bigtable: A Distributed Storage System for Structured Data 2006 OSDI'06: Seventh Symposium on Operating System Design and Implementation   conference
    Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
    Review: Read
    BibTeX:
    @conference{Chang2006,
      author = {Fay Chang and Jeffrey Dean and Sanjay Ghemawat and Wilson C. Hsieh and Deborah A. Wallach and Mike Burrows and Tushar Chandra and Andrew Fikes and Robert E. Gruber},
      title = {Bigtable: A Distributed Storage System for Structured Data},
      booktitle = {OSDI'06: Seventh Symposium on Operating System Design and Implementation},
      year = {2006},
      url = {http://labs.google.com/papers/bigtable-osdi06.pdf}
    }
    					
    Chen Chen, J. Group Theory and the Rubik’s Cube   booklet
    BibTeX:
    @booklet{Chen,
      author = {Janet Chen},
      title = {Group Theory and the Rubik’s Cube},
      url = {http://www.math.harvard.edu/~jjchen/docs/Group%20Theory%20and%20the%20Rubik%27s%20Cube.pdf}
    }
    					
    Dean2004 Dean, J. & Ghemawat, S. MapReduce: simplified data processing on large clusters 2004 Commun. ACM
    Vol. 51 (1) , pp. 107-113  
    article
    Abstract: MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google's clusters every day, processing a total of more than twenty petabytes of data per day.
    Review: Half read
    BibTeX:
    @article{Dean2004,
      author = {Dean, Jeffrey and Ghemawat, Sanjay},
      title = {MapReduce: simplified data processing on large clusters},
      journal = {Commun. ACM},
      publisher = {ACM},
      year = {2004},
      volume = {51},
      number = {1},
      pages = {107--113},
      url = {http://labs.google.com/papers/mapreduce-osdi04.pdf},
      doi = {http://doi.acm.org/10.1145/1327452.1327492}
    }
    					
    Decandia2007 Decandia, G.; Hastorun, D.; Jampani, M.; Kakulapati, G.; Lakshman, A.; Pilchin, A.; Sivasubramanian, S.; Vosshall, P. & Vogels, W. Dynamo: amazon's highly available key-value store 2007 SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles , pp. 205-220   inproceedings amazon, dynamo, sosp
    BibTeX:
    @inproceedings{Decandia2007,
      author = {Decandia, Giuseppe and Hastorun, Deniz and Jampani, Madan and Kakulapati, Gunavardhan and Lakshman, Avinash and Pilchin, Alex and Sivasubramanian, Swaminathan and Vosshall, Peter and Vogels, Werner},
      title = {Dynamo: amazon's highly available key-value store},
      booktitle = {SOSP '07: Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles},
      publisher = {ACM Press},
      year = {2007},
      pages = {205--220},
      url = {http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf},
      doi = {http://dx.doi.org/10.1145/1294261.1294281}
    }
    					
    Eisenhauer2006 Eisenhauer, G.; Schwan, K. & Bustamante, F. Publish-Subscribe for High-Performance Computing 2006 IEEE Internet Computing
    Vol. 10 (1) , pp. 40-47  
    article event-based-systems, publish-subscribe
    Review: Read
    BibTeX:
    @article{Eisenhauer2006,
      author = {Eisenhauer, Greg and Schwan, Karsten and Bustamante, Fabian},
      title = {Publish-Subscribe for High-Performance Computing},
      journal = {IEEE Internet Computing},
      publisher = {IEEE Educational Activities Department},
      year = {2006},
      volume = {10},
      number = {1},
      pages = {40--47},
      url = {http://wwwse.inf.tu-dresden.de/wiki/images/7/7b/Seminar_ws06_zhang_xianwen_presentation.pdf},
      doi = {http://dx.doi.org/10.1109/MIC.2006.16}
    }
    					
    Fischer1985 Fischer, M.J.; Lynch, N.A. & Paterson, M.S. Impossibility of distributed consensus with one faulty process 1985 J. ACM
    Vol. 32 (2) , pp. 374-382  
    article distributed-systems
    Abstract: The consensus problem involves an asynchronous system of processes, some of which may be unreliable. The problem is for the reliable processes to agree on a binary value. In this paper, it is shown that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the "Byzantine Generals" prblem.
    Review: Reading
    BibTeX:
    @article{Fischer1985,
      author = {Fischer, Michael J. and Lynch, Nancy A. and Paterson, Michael S.},
      title = {Impossibility of distributed consensus with one faulty process},
      journal = {J. ACM},
      publisher = {ACM Press},
      year = {1985},
      volume = {32},
      number = {2},
      pages = {374--382},
      url = {http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf},
      doi = {http://dx.doi.org/10.1145/3149.214121}
    }
    					
    Gafni2003 Gafni, E. & Lamport, L. Disk Paxos 2003 Distrib. Comput.
    Vol. 16 (1) , pp. 1-20  
    article
    BibTeX:
    @article{Gafni2003,
      author = {Gafni, Eli and Lamport, Leslie},
      title = {Disk Paxos},
      journal = {Distrib. Comput.},
      publisher = {Springer-Verlag},
      year = {2003},
      volume = {16},
      number = {1},
      pages = {1--20},
      doi = {http://dx.doi.org/10.1007/s00446-002-0070-8}
    }
    					
    Ghemawat2003 Ghemawat, S.; Gobioff, H. & Leung, S. The Google File System 2003 19th ACM Symposium on Operating Systems Principles   conference
    Abstract: We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.

    While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.

    The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients.

    In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.

    Review: Half read
    BibTeX:
    @conference{Ghemawat2003,
      author = {Sanjay Ghemawat and Howard Gobioff and and Shun-Tak Leung},
      title = {The Google File System},
      booktitle = {19th ACM Symposium on Operating Systems Principles},
      year = {2003},
      url = {http://labs.google.com/papers/gfs-sosp2003.pdf}
    }
    					
    Gilbert2002 Gilbert, S. & Lynch, N. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services 2002 SIGACT News
    Vol. 33 (2) , pp. 51-59  
    article
    BibTeX:
    @article{Gilbert2002,
      author = {Gilbert, Seth and Lynch, Nancy},
      title = {Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services},
      journal = {SIGACT News},
      publisher = {ACM},
      year = {2002},
      volume = {33},
      number = {2},
      pages = {51--59},
      doi = {http://doi.acm.org/10.1145/564585.564601}
    }
    					
    Hayashibara2002 Hayashibara, N.; Schiper, A.; Urban, P.; Katayama, T.; Polytechnique, É. & Lausanne, F. Performance Comparison between the Paxos and Chandra-Toueg Consensus Algorithms 2002 In: Proc. of International Arab Conference on Information Technology , pp. 526-533   inproceedings
    Abstract: Protocols which solve agreement problems are essential building blocks for fault tolerant distributed applications. While many protocols have been published, little has been done to analyze their performance. This paper represents a starting point for such studies, by focusing on the consensus problem, a problem related to most other agreement problems. The paper compares the latency of two consensus algorithms designed for the asynchronous model with failure detectors: the Paxos algorithm and the Chandra-Toueg algorithm. We varied the number of processes which take part in the execution. Moreover, we evaluated the latency in different classes of runs: (1) runs with no failures nor failure suspicions, (2) runs with failures but no wrong suspicions. We determined the latency by measurements on a cluster of PCs interconnected with a 100 Mbps Ethernet network. We found that the Paxos algorithm is more efficient than the Chandra-Toueg algorithm when the process that coordinates the first round of the protocol crashes. The two algorithms have almost the same performance in all other cases.
    BibTeX:
    @inproceedings{Hayashibara2002,
      author = {Naohiro Hayashibara and Andre Schiper and Peter Urban and Takuya Katayama and École Polytechnique and Fédérale Lausanne},
      title = {Performance Comparison between the Paxos and Chandra-Toueg Consensus Algorithms},
      booktitle = {In: Proc. of International Arab Conference on Information Technology},
      publisher = {Online]. Available: http://lsrwww.epfl.ch/Publications/ById/346.html},
      year = {2002},
      pages = {526--533},
      url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.2.6966}
    }
    					
    Hudak2007 Hudak, P.; Hughes, J.; Jones, S.P. & Wadler, P. A history of Haskell: being lazy with class 2007 HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages , pp. 12-1-12-55   inproceedings
    Abstract: This paper describes the history of Haskell, including its genesis and principles, technical contributions, implementations and tools, and applications and impact.
    BibTeX:
    @inproceedings{Hudak2007,
      author = {Hudak, Paul and Hughes, John and Jones, Simon Peyton and Wadler, Philip},
      title = {A history of Haskell: being lazy with class},
      booktitle = {HOPL III: Proceedings of the third ACM SIGPLAN conference on History of programming languages},
      publisher = {ACM},
      year = {2007},
      pages = {12-1--12-55},
      doi = {http://doi.acm.org/10.1145/1238844.1238856}
    }
    					
    Hunt2010 Hunt, P.; Konar, M.J.F.R.B. ZooKeeper: Wait-free coordination for Internet-scale systems 2010 USENIX Annual Technology Conference (2010)   conference
    Abstract: In this paper, we describe ZooKeeper, a service for coordinating processes of distributed applications. Since ZooKeeper is part of critical infrastructure, ZooKeeper aims to provide a simple and high performance kernel for building more complex coordination primitives at the client. It incorporates elements from group messaging, shared registers, and distributed lock services in a replicated, centralized service. The interface exposed by Zoo- Keeper has the wait-free aspects of shared registers with an event-driven mechanism similar to cache invalidations of distributed file systems to provide a simple, yet powerful coordination service. The ZooKeeper interface enables a high-performance service implementation. In addition to the wait-free property, ZooKeeper provides a per client guarantee of FIFO execution of requests and linearizability for all requests that change the ZooKeeper state. These design decisions enable the implementation of a high performance processing pipeline with read requests being satisfied by local servers. We show for the target workloads, 2:1 to 100:1 read to write ratio, that ZooKeeper can handle tens to hundreds of thousands of transactions per second. This performance allows ZooKeeper to be used extensively by client applications.
    BibTeX:
    @conference{Hunt2010,
      author = {Hunt, P.; Konar, M.; Junqueira, F.P.; Reed, B.},
      title = {ZooKeeper: Wait-free coordination for Internet-scale systems},
      booktitle = {USENIX Annual Technology Conference (2010)},
      year = {2010}
    }
    					
    Ian2009 Ian Thomas Varley, M. No Relation: The Mixed Blessings of Non-Relational Databases 2009 School: The University of Texas at Austin   mastersthesis
    Abstract: This paper investigates a new class of database systems loosely referred to as "non-relational databases," which offer a subset of traditional relational database functionality, in exchange for improved scalability, performance, and / or simplicity. We explore the differences in conceptual modeling techniques, and examine both the advantages and limitations of several classes of currently available systems, using running examples of real-world problems as implemented in both a traditional relational database model, as well as several non-relational models.
    BibTeX:
    @mastersthesis{Ian2009,
      author = {Ian Thomas Varley, M.S.E.},
      title = {No Relation: The Mixed Blessings of Non-Relational Databases},
      school = {The University of Texas at Austin},
      year = {2009}
    }
    					
    Isard2007 Isard, M.; Budiu, M.; Yu, Y.; Birrell, A. & Fetterly, D. Dryad: distributed data-parallel programs from sequential building blocks 2007 EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007 , pp. 59-72   inproceedings mapreduce
    Abstract: Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel applications. A Dryad application combines computational "vertices" with communication "channels" to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of available computers, communicating as appropriate through flies, TCP pipes, and shared-memory FIFOs.
    BibTeX:
    @inproceedings{Isard2007,
      author = {Isard, Michael and Budiu, Mihai and Yu, Yuan and Birrell, Andrew and Fetterly, Dennis},
      title = {Dryad: distributed data-parallel programs from sequential building blocks},
      booktitle = {EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007},
      publisher = {ACM},
      year = {2007},
      pages = {59--72},
      url = {http://research.microsoft.com/en-us/projects/dryad/eurosys07.pdf},
      doi = {http://dx.doi.org/10.1145/1272996.1273005}
    }
    					
    Karger1997 Karger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; Levine, M. & Lewin, D. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web 1997 STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing , pp. 654-663   inproceedings
    Abstract: We describe a family of caching protocols for distributed networks that can be used to decrease or eliminate the occurrence of hot spots in the network. Our protocols are particularly designed for use with very large networks such as the Internet, where delays caused by hot spots can be severe, and where it is not feasible for every server to have complete information about the current state of the entire network. The protocols are easy to implement using existing network protocols such as TCP/IP, and require very little overhead. The protocols work with local control, make efficient use of existing resources, and scale gracefully as the network grows. Our caching protocols are based on a special kind of hashing that we call consistent hashing. Roughly speaking, a consistent hash function is one which changes minimally as the range of the function changes. Through the development of good consistent hash functions, we are able to develop caching protocols which do not require users to have a current or even consistent view of the network. We believe that consistent hash functions may eventually prove to be useful in other applications such as distributed name servers and/or quorum systems.
    BibTeX:
    @inproceedings{Karger1997,
      author = {Karger, David and Lehman, Eric and Leighton, Tom and Panigrahy, Rina and Levine, Matthew and Lewin, Daniel},
      title = {Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web},
      booktitle = {STOC '97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing},
      publisher = {ACM},
      year = {1997},
      pages = {654--663},
      url = {http://www.akamai.com/dl/technical_publications/ConsistenHashingandRandomTreesDistributedCachingprotocolsforrelievingHotSpotsontheworldwideweb.pdf},
      doi = {http://doi.acm.org/10.1145/258533.258660}
    }
    					
    Kirsch2008 Kirsch, J. & Amir, Y. Paxos for System Builders: an overview 2008 LADIS '08: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware

    [sjcui] Isn't it only a tech-report? , pp. 1-6  

    inproceedings distributed, paxos
    Abstract: This paper presents an overview of Paxos for System Builders, a complete specification of the Paxos replication protocol such that system builders can understand it and implement it. We evaluate the performance of a prototype implementation and detail the safety and liveness properties guaranteed by our specification of Paxos.
    BibTeX:
    @inproceedings{Kirsch2008,
      author = {Kirsch, Jonathan and Amir, Yair},
      title = {Paxos for System Builders: an overview},
      booktitle = {LADIS '08: Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware

    [sjcui] Isn't it only a tech-report?}, publisher = {ACM}, year = {2008}, pages = {1--6}, url = {http://www.cs.jhu.edu/~jak/docs/paxos_for_system_builders.pdf}, doi = {http://dx.doi.org/10.1145/1529974.1529979} }

    Knuth1984 Knuth, D. The TeX book 1984   book
    BibTeX:
    @book{Knuth1984,
      author = {Knuth, D.E.},
      title = {The TeX book},
      publisher = {Addison-Wesley},
      year = {1984},
      url = {http://www-cs-faculty.stanford.edu/~knuth/abcde.html}
    }
    					
    Lamport1982 Lamport; Shostak & Pease The Byzantine Generals Problem 1982 Advances in Ultra-Dependable Distributed Systems, N. Suri, C. J. Walter, and M. M. Hugue (Eds.), IEEE Computer Society Press   incollection algorithms, byzantine, security
    Abstract: Reliable computer systems must handle malfunctioning components that give conflicting information to different parts of the system. This situation can be expressed abstractly in terms of a group of generals of the Byzantine army camped with their troops around an enemy city. Communicating only by messenger, the generals must agree upon a common battle plan. However, one of more of them may be traitors who will try to confuse the others. The problem is to find an algorithm to ensure that the...
    BibTeX:
    @incollection{Lamport1982,
      author = {Lamport and Shostak and Pease},
      title = {The Byzantine Generals Problem},
      booktitle = {Advances in Ultra-Dependable Distributed Systems, N. Suri, C. J. Walter, and M. M. Hugue (Eds.), IEEE Computer Society Press},
      publisher = {the Association for Computing Machinery, Inc.},
      year = {1982},
      url = {http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#byz}
    }
    					
    Lamport2006 Lamport, L. Fast Paxos 2006 Distributed Computing
    Vol. 19 (2) , pp. 79-103  
    article
    Abstract: As used in practice, traditional consensus algorithms require three message delays before any process can learn the chosen value. Fast Paxos is an extension of the classic Paxos algorithm that allows the value to be learned in two message delays. How and why the algorithm works are explained informally, and a TLA+ specification of the algorithm appears as an appendix.
    BibTeX:
    @article{Lamport2006,
      author = {Leslie Lamport},
      title = {Fast Paxos},
      journal = {Distributed Computing},
      year = {2006},
      volume = {19},
      number = {2},
      pages = {79-103},
      url = {http://research.microsoft.com/pubs/64624/tr-2005-112.pdf}
    }
    					
    Lamport2002 Lamport, L. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers 2002   book
    BibTeX:
    @book{Lamport2002,
      author = {Lamport, Leslie},
      title = {Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers},
      publisher = {Addison-Wesley Longman Publishing Co., Inc.},
      year = {2002},
      url = {http://research.microsoft.com/en-us/um/people/lamport/tla/book.html}
    }
    					
    Lamport2001 Lamport, L. Paxos Made Simple 2001 SIGACT News
    Vol. 32 (4) , pp. 51-58  
    article paxos
    Abstract: The Paxos algorithm, when presented in plain English, is very simple.
    Review: Read
    BibTeX:
    @article{Lamport2001,
      author = {Lamport, Leslie},
      title = {Paxos Made Simple},
      journal = {SIGACT News},
      publisher = {ACM},
      year = {2001},
      volume = {32},
      number = {4},
      pages = {51--58},
      url = {http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf},
      doi = {http://dx.doi.org/10.1145/568425.568433}
    }
    					
    Lamport1998 Lamport, L. The part-time parliament 1998 ACM Trans. Comput. Syst.
    Vol. 16 (2) , pp. 133-169  
    article algorithm, algorithms, computing, concensus, distributed, paxos, system, systems
    Abstract: Recent archaeological discoveries on the island of Paxos reveal that the parliament functioned despite the peripatetic propensity of its part-time legislators. The legislators maintained consistentcopies of the parliamentary record, despite their frequent forays from the chamber and the forgetfulness of their messengers. The Paxon parliament's protocol provides a new way of implementing the state-machine approach to the design of distributed systems.
    Review: Half read
    BibTeX:
    @article{Lamport1998,
      author = {Lamport, Leslie},
      title = {The part-time parliament},
      journal = {ACM Trans. Comput. Syst.},
      publisher = {ACM Press},
      year = {1998},
      volume = {16},
      number = {2},
      pages = {133--169},
      url = {http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf},
      doi = {http://dx.doi.org/10.1145/279227.279229}
    }
    					
    Lamport1978 Lamport, L. Time, clocks, and the ordering of events in a distributed system 1978 Commun. ACM
    Vol. 21 (7) , pp. 558-565  
    article distributed, lamport, relativity, time
    BibTeX:
    @article{Lamport1978,
      author = {Lamport, Leslie},
      title = {Time, clocks, and the ordering of events in a distributed system},
      journal = {Commun. ACM},
      publisher = {ACM Press},
      year = {1978},
      volume = {21},
      number = {7},
      pages = {558--565},
      url = {https://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf},
      doi = {http://dx.doi.org/10.1145/359545.359563}
    }
    					
    Lamport2009 Lamport, L.; Malkhi, D. & Zhou, L. Vertical paxos and primary-backup replication 2009 PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing , pp. 312-313   inproceedings
    Abstract: We introduce a class of Paxos algorithms called Vertical Paxos, in which reconfiguration can occur in the middle of reaching agreement on an individual state-machine command. Vertical Paxos algorithms use an auxiliary configuration master that facilitates agreement on reconfiguration. A special case of these algorithms leads to traditional primary-backup protocols. We show how primary-backup systems in current use can be viewed, and shown to be correct, as instances of Vertical Paxos algorithms.
    BibTeX:
    @inproceedings{Lamport2009,
      author = {Lamport, Leslie and Malkhi, Dahlia and Zhou, Lidong},
      title = {Vertical paxos and primary-backup replication},
      booktitle = {PODC '09: Proceedings of the 28th ACM symposium on Principles of distributed computing},
      publisher = {ACM},
      year = {2009},
      pages = {312--313},
      doi = {http://doi.acm.org/10.1145/1582716.1582783}
    }
    					
    Lamport2004 Lamport, L. & Massa, M. Cheap Paxos 2004 DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN'04)   inproceedings algorithm, algorithms, distributed, paxos, system, systems
    Abstract: Asynchronous algorithms for implementing a fault-tolerantdistributed system, which can make progressdespite the failure of any F processors, require 2F + 1processors. Cheap Paxos, a variant of the Paxos algorithm,guarantees liveness under the additional assumptionthat the set of nonfaulty processors does not"jump around" too fast, but uses only F + 1 main processorsthat actually execute the system and F auxiliaryprocessors that are used only to handle the failure of amain processor. The auxiliary processors take part inreconfiguring the system to remove the failed processor,after which they can remain idle until another mainprocessor fails.
    BibTeX:
    @inproceedings{Lamport2004,
      author = {Lamport, Leslie and Massa, Mike},
      title = {Cheap Paxos},
      booktitle = {DSN '04: Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN'04)},
      publisher = {IEEE Computer Society},
      year = {2004},
      url = {http://research.microsoft.com/en-us/um/people/lamport/pubs/web-dsn-submission.pdf}
    }
    					
    Lampson2001 Lampson, B. The ABCD's of Paxos 2001 PODC '01: Proceedings of the twentieth annual ACM symposium on Principles of distributed computing   inproceedings paxos
    Abstract: We explain how consensus is used to implement replicated state machines, the general mechanism for fault-tolerance. We describe an abstract version of Lamport's Paxos algorithm for asynchronous consensus. Then we derive the Byzantine, classic, and disk versions of Paxos from the abstract one, show how they are related to each other, discuss the safety, liveness, and performance of each one, and give the abstraction functions and invariants for simulation proofs of safety.
    Review: Reading
    BibTeX:
    @inproceedings{Lampson2001,
      author = {Lampson, Butler},
      title = {The ABCD's of Paxos},
      booktitle = {PODC '01: Proceedings of the twentieth annual ACM symposium on Principles of distributed computing},
      publisher = {ACM Press},
      year = {2001},
      url = {http://courses.csail.mit.edu/6.852/01/papers/PaxosX40.pdf},
      doi = {http://dx.doi.org/10.1145/383962.383969}
    }
    					
    Lampson1996 Lampson, B.W. Babaoglu & Marzullo (Hrsg.) How to Build a Highly Available System Using Consensus 1996
    Vol. 1151 10th International Workshop on Distributed Algorithms (WDAG 96) , pp. 1-17  
    inproceedings cluster-computing, consensus, fault-tolerance, has-note, paxos, software-engineering
    Abstract: Lamport showed that a replicated deterministic state machine is a general way to implement a highly available system, given a consensus algorithm that the replicas can use to agree on each input. His Paxos algorithm is the most fault-tolerant way to get consensus without real-time guarantees. Because general consensus is expensive, practical systems reserve it for emergencies and use leases (locks that time out) for most of the computing. This paper explains the general scheme for efficient...
    BibTeX:
    @inproceedings{Lampson1996,
      author = {Lampson, B. W.},
      title = {How to Build a Highly Available System Using Consensus},
      booktitle = {10th International Workshop on Distributed Algorithms (WDAG 96)},
      publisher = {Springer-Verlag, Berlin Germany},
      year = {1996},
      volume = {1151},
      pages = {1--17},
      url = {http://research.microsoft.com/en-us/um/people/blampson/58-Consensus/Acrobat.pdf}
    }
    					
    Maheshwari1997 Maheshwari, U. HULA: An Efficient Protocol for Reliable Delivery of Messages 1997   techreport
    BibTeX:
    @techreport{Maheshwari1997,
      author = {Maheshwari, U.},
      title = {HULA: An Efficient Protocol for Reliable Delivery of Messages},
      publisher = {Massachusetts Institute of Technology},
      year = {1997},
      url = {http://docs.google.com/publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TR-720.pdf}
    }
    					
    Malkhi2005 Malkhi, D.; Oprea, F. & Zhou, L. Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links 2005 DISC , pp. 199-213   inproceedings
    Abstract: This paper provides a realization of distributed leader election without having any eventual timely links. Progress is guaranteed in the following weak setting: Eventually one process can send messages such that every message obtains f timely responses, where f is a resilience bound. A crucial facet of this property is that the f responders need not be fixed, and may change from one message to another. In particular, this means that no specific link needs to remain timely. In the (common) case where f = 1, this implies that the FLP impossibility result on consensus is circumvented if one process can at any time communicate in a timely manner with one other process in the system. The protocol also bears significant practical importance to well-known coordination schemes such as Paxos, because our setting more precisely captures the conditions on the elected leader for reaching timely consensus. Additionally, an extension of our protocol provides leader stability, which guarantees against arbitrary demotion of a qualified leader and avoids performance penalties associated with leader changes in schemes such as Paxos.
    BibTeX:
    @inproceedings{Malkhi2005,
      author = {Dahlia Malkhi and Florian Oprea and Lidong Zhou},
      title = {Omega Meets Paxos: Leader Election and Stability Without Eventual Timely Links},
      booktitle = {DISC},
      year = {2005},
      pages = {199-213},
      url = {http://research.microsoft.com/apps/pubs/default.aspx?id=64674}
    }
    					
    Marton2009 Marton Trencseni, Attila Gazso, H.R. PaxosLease: Diskless Paxos for Leases 2009   misc distributed, keyspace, lease, lock, paxos, paxoslease, scalien
    Abstract: This paper describes PaxosLease, a distributed algorithm for lease negotiation. PaxosLease is based on Paxos, but does not require disk writes or clock synchrony. PaxosLease is used for master lease negotation in the open-source Keyspace replicated key-value store.
    BibTeX:
    @misc{Marton2009,
      author = {Marton Trencseni,Attila Gazso,Holger Reinhardt},
      title = {PaxosLease: Diskless Paxos for Leases},
      year = {2009},
      url = {http://scalien.com/whitepapers/}
    }
    					
    Mattsson1998 Mattsson, H.; Nilsson, H. & Wikstrom, C. Mnesia - A Distributed Robust DBMS for Telecommunications Applications 1998 PADL '99: Proceedings of the First International Workshop on Practical Aspects of Declarative Languages , pp. 152-163   inproceedings
    Abstract: The Mnesia DBMS runs in the same adress space as the application owning the data, yet the application cannot destroy the contents of the data base. This provides for both fast accesses and effcient fault tolerance, normally conflicting requirements. The implementation is

    based on features in the Erlang programming language, in which Mnesia is embedded.

    Review: Read
    BibTeX:
    @inproceedings{Mattsson1998,
      author = {Haakan Mattsson and Hans Nilsson and Claes Wikstrom},
      title = {Mnesia - A Distributed Robust DBMS for Telecommunications Applications},
      booktitle = {PADL '99: Proceedings of the First International Workshop on Practical Aspects of Declarative Languages},
      publisher = {Springer-Verlag},
      year = {1998},
      pages = {152--163},
      url = {http://www.erlang.se/publications/mnesia_overview.pdf}
    }
    					
    Mazieres2007 Mazieres, D. Paxos Made Practical 2007   unpublished paxos
    BibTeX:
    @unpublished{Mazieres2007,
      author = {Mazieres, David},
      title = {Paxos Made Practical},
      year = {2007},
      note = {Haven't read it yet.},
      url = {http://www.scs.stanford.edu/~dm/home/papers/paxos.pdf}
    }
    					
    McKusick2009 McKusick, M.K. & Quinlan, S. GFS: Evolution on Fast-forward 2009 Queue
    Vol. 7 (7) , pp. 10-20  
    article
    Review: Half read
    BibTeX:
    @article{McKusick2009,
      author = {McKusick, Marshall Kirk and Quinlan, Sean},
      title = {GFS: Evolution on Fast-forward},
      journal = {Queue},
      publisher = {ACM},
      year = {2009},
      volume = {7},
      number = {7},
      pages = {10--20},
      doi = {http://doi.acm.org/10.1145/1594204.1594206}
    }
    					
    Merkle1988 Merkle, R.C. A Digital Signature Based on a Conventional Encryption Function 1988 CRYPTO '87: A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology , pp. 369-378   inproceedings
    BibTeX:
    @inproceedings{Merkle1988,
      author = {Merkle, Ralph C.},
      title = {A Digital Signature Based on a Conventional Encryption Function},
      booktitle = {CRYPTO '87: A Conference on the Theory and Applications of Cryptographic Techniques on Advances in Cryptology},
      publisher = {Springer-Verlag},
      year = {1988},
      pages = {369--378},
      url = {http://dsns.csie.nctu.edu.tw/research/crypto/HTML/PDF/C87/369.PDF}
    }
    					
    Michael2003 Michael, M.M. CAS-based lock-free algorithm for shared deques 2003 In the 9th Euro-Par Conference on Parallel Processing , pp. 651-660   inproceedings
    BibTeX:
    @inproceedings{Michael2003,
      author = {Maged M. Michael},
      title = {CAS-based lock-free algorithm for shared deques},
      booktitle = {In the 9th Euro-Par Conference on Parallel Processing},
      publisher = {Springer Verlag},
      year = {2003},
      pages = {651--660},
      url = {http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.93.7492&rep=rep1&type=pdf}
    }
    					
    Perl2006 Perl, S.E. & Seltzer, M. Data management for internet-scale single-sign-on 2006 WORLDS'06: Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems , pp. 8   inproceedings distributed, sso
    Abstract: Google offers a variety of Internet services that require user authentication. These services rely on a single-sign-on service, called Google Accounts, that has been in active deployment since 2002. As of 2006, Google has tens of applications with millions of user accounts worldwide. We describe the data management requirements and architecture for this service, the problems we encountered, and the experience we've had running it. In doing so we provide perspective on "where theory meets practice." The success of the system comes from combining good algorithms with practical engineering tradeoffs.
    BibTeX:
    @inproceedings{Perl2006,
      author = {Perl, Sharon E. and Seltzer, Margo},
      title = {Data management for internet-scale single-sign-on},
      booktitle = {WORLDS'06: Proceedings of the 3rd conference on USENIX Workshop on Real, Large Distributed Systems},
      publisher = {USENIX Association},
      year = {2006},
      pages = {8},
      url = {http://www.usenix.org/event/worlds06/tech/prelim_papers/perl/perl.pdf}
    }
    					
    Pike2005 Pike, R.; Dorward, S.; Griesemer, R. & Quinlan, S. Interpreting the data: Parallel analysis with Sawzall 2005 Sci. Program.
    Vol. 13 (4) , pp. 277-298  
    article mapreduce, required
    Abstract: Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on.

    We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design - including the separation into two phases, the form of the programming language, and the properties of the aggregators - exploits the parallelism inherent in having data and computation distributed across many machines.

    BibTeX:
    @article{Pike2005,
      author = {Pike, Rob and Dorward, Sean and Griesemer, Robert and Quinlan, Sean},
      title = {Interpreting the data: Parallel analysis with Sawzall},
      journal = {Sci. Program.},
      publisher = {IOS Press},
      year = {2005},
      volume = {13},
      number = {4},
      pages = {277--298},
      url = {http://labs.google.com/papers/sawzall.html}
    }
    					
    Primi2009 Primi, M. Paxos made code 2009 School: University of Lugano   mastersthesis
    Abstract: The PAXOS algorithm is used to implement Atomic Broadcast, an important communication primitive useful for building fault-tolerant distributed systems. Transforming a formal description into an efficient, scalable and reliable implementation is a difficult process that requires addressing a number of practical issues and making careful design choices. In this document we share our experience in building, verifying and benchmarking different Paxos-based implementations of Atomic Broadcast.
    BibTeX:
    @mastersthesis{Primi2009,
      author = {Marco Primi},
      title = {Paxos made code},
      school = {University of Lugano},
      year = {2009},
      url = {http://libpaxos.sourceforge.net/}
    }
    					
    Prisco2000 Prisco, R.D. & Nancy, L. Revisiting the paxos algorithm 2000 Theoretical Computer Science
    Vol. 243 (1-2) , pp. 35-91  
    article fault-tolerance, paxos
    Abstract: The algorithm is an efficient and highly fault-tolerant algorithm, devised by Lamport, for reaching consensus in a distributed system. Although it appears to be practical, it seems to be not widely known or understood. This paper contains a new presentation of the algorithm, based on a formal decomposition into several interacting components. It also contains a correctness proof and a time performance and fault-tolerance analysis. The formal framework used for the presentation of the algorithm is provided by the Clock General Timed Automaton (Clock GTA) model. The Clock GTA provides a systematic way of describing timing-based systems in which there is a notion of "normal" timing behavior, but that do not necessarily always exhibit this "normal" timing behavior.
    BibTeX:
    @article{Prisco2000,
      author = {Roberto De Prisco and Lynch Nancy},
      title = {Revisiting the paxos algorithm},
      journal = {Theoretical Computer Science},
      year = {2000},
      volume = {243},
      number = {1-2},
      pages = {35--91},
      url = {http://bitsavers.informatik.uni-stuttgart.de/pdf/mit/lcs/tr/MIT-LCS-TR-717.pdf},
      doi = {http://dx.doi.org/10.1016/S0304-3975(00)00042-6}
    }
    					
    Rodeh2008 Rodeh, O. B-trees, shadowing, and clones 2008 Trans. Storage
    Vol. 3 (4) , pp. 1-27  
    article
    BibTeX:
    @article{Rodeh2008,
      author = {Rodeh, Ohad},
      title = {B-trees, shadowing, and clones},
      journal = {Trans. Storage},
      publisher = {ACM},
      year = {2008},
      volume = {3},
      number = {4},
      pages = {1--27},
      doi = {http://doi.acm.org/10.1145/1326542.1326544}
    }
    					
    Saint-Andre2005 Saint-Andre, P. Streaming XML with Jabber/XMPP 2005 IEEE Internet Computing
    Vol. 9 (5) , pp. 82-89  
    article jabber, xmpp
    Abstract: Jabber is an open alternative to closed instant messaging (IM) and presence services. At its core is the Extensible Messaging and Presence Protocol (XMPP), which defines how to stream XML content and is being used to build not only a large open IM network but also a wide range of XML applications. This article provides an overview of Jabber/XMPP protocols and technologies, as well as an introduction to XMPP-based applications.
    Review: Roughly read
    BibTeX:
    @article{Saint-Andre2005,
      author = {Saint-Andre, Peter},
      title = {Streaming XML with Jabber/XMPP},
      journal = {IEEE Internet Computing},
      publisher = {IEEE Educational Activities Department},
      year = {2005},
      volume = {9},
      number = {5},
      pages = {82--89},
      url = {http://www.saint-andre.com/jabber/xtech2005.pdf},
      doi = {http://dx.doi.org/10.1109/MIC.2005.110}
    }
    					
    Saint-Andre2004 Saint-Andre, P. Extensible Messaging and Presence Protocol (XMPP): Core 2004   misc instant, messaging, presence, xmpp
    Abstract: This memo defines the core features of the Extensible Messaging and Presence Protocol (XMPP), a protocol for streaming Extensible Markup Language (XML) elements in order to exchange structured information in close to real time between any two network endpoints. While XMPP provides a generalized, extensible framework for exchanging XML data, it is used mainly for the purpose of building instant messaging and presence applications that meet the requirements of RFC 2779.
    Review: Read
    BibTeX:
    @misc{Saint-Andre2004,
      author = {Saint-Andre, P.},
      title = {Extensible Messaging and Presence Protocol (XMPP): Core},
      publisher = {RFC Editor},
      year = {2004},
      url = {http://tools.ietf.org/pdf/rfc3920.pdf}
    }
    					
    Saint-Andre2004a Saint-Andre, P. Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence 2004 Internet RFC 3921  misc imported
    Abstract: This memo describes extensions to and applications of the core features of the Extensible Messaging and Presence Protocol (XMPP) that provide the basic instant messaging (IM) and presence functionality defined in RFC 2779.
    Review: Read
    BibTeX:
    @misc{Saint-Andre2004a,
      author = {Peter Saint-Andre},
      title = {Extensible Messaging and Presence Protocol (XMPP): Instant Messaging and Presence},
      year = {2004},
      url = {http://tools.ietf.org/pdf/rfc3921.pdf}
    }
    					
    Schutt2008 Schϋtt, T.; Schintke, F. & Reinefeld, A. Scalaris: reliable transactional p2p key/value store 2008 ERLANG '08: Proceedings of the 7th ACM SIGPLAN workshop on ERLANG , pp. 41-48   inproceedings consensus, distributed-databases, distributed-systems, paxos, two-phase-commit
    Abstract: We present Scalaris, an Erlang implementation of a distributed key/value store. It uses, on top of a structured overlay network, replication for data availability and majority based distributed transactions for data consistency. In combination, this implements the ACID properties on a scalable structured overlay.
    BibTeX:
    @inproceedings{Schutt2008,
      author = {Schϋtt, Thorsten and Schintke, Florian and Reinefeld, Alexander},
      title = {Scalaris: reliable transactional p2p key/value store},
      booktitle = {ERLANG '08: Proceedings of the 7th ACM SIGPLAN workshop on ERLANG},
      publisher = {ACM},
      year = {2008},
      pages = {41--48},
      url = {http://dx.doi.org/10.1145/1411273.1411280},
      doi = {http://dx.doi.org/10.1145/1411273.1411280}
    }
    					
    Soparkar1994 Soparkar, N.; Levy, E.; Korth, H.F. & Silberschatz, A. Adaptive Commitment for Distributed Real-Time Transactions 1994 Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Maryland, November 29 - December 2, 1994 , pp. 187-194   inproceedings
    Abstract: Distributed real-time transaction systems are useful for both real-time and high-performance database applications. Standard transaction management approaches that use the two-phase commit protocol suffer from its high costs and blocking behavior which is problematic in real-time computing environments. Our approach in this paper is to identify ways in which a commit protocol can be made adaptive in the sense that under situations that demand it, such as a transient local overload, the system can dynamically change to a different commitment strategy. The decision to do so can be taken autonomously at any site. The different commitment strategies exploit a trade-off between the cost of commitment and the obtained degree of atomicity. Our protocols are based on optimistic commitment strategies, and they rely on local compensatory actions to recover from non-atomic executions. We provide the necessary framework to study the logical and temporal correctness criteria, and we describe examples to illustrate the use of our strategies.
    BibTeX:
    @inproceedings{Soparkar1994,
      author = {Nandit Soparkar and Eliezer Levy and Henry F. Korth and Abraham Silberschatz},
      title = {Adaptive Commitment for Distributed Real-Time Transactions},
      booktitle = {Proceedings of the Third International Conference on Information and Knowledge Management (CIKM'94), Gaithersburg, Maryland, November 29 - December 2, 1994},
      publisher = {ACM},
      year = {1994},
      pages = {187-194},
      url = {https://eprints.kfupm.edu.sa/22788/1/22788.pdf}
    }
    					
    Stallman2002 Stallman, R.M.; Pesch, R. & Shebs, S. Debugging with GDB - The GNU Source-Level Debugger 2002   book compilers, debuggers
    BibTeX:
    @book{Stallman2002,
      author = {Stallman, Richard M. and Pesch, Roland and Shebs, Stan},
      title = {Debugging with GDB - The GNU Source-Level Debugger},
      publisher = {GNU Press},
      year = {2002},
      url = {http://www.gnu.org/software/gdb/documentation/}
    }
    					
    Soegaard-Andersen Søgaard-Andersen, J.F.; Lynch, N.A. & Lampson, B.W. Correctness of Communication Protocols - A Case Study   misc
    BibTeX:
    @misc{Soegaard-Andersen,
      author = {Jørgen F. Søgaard-Andersen and Nancy A. Lynch and Butler W. Lampson},
      title = {Correctness of Communication Protocols - A Case Study},
      url = {http://publications.csail.mit.edu/lcs/pubs/ps/MIT-LCS-TR-589.ps.gz}
    }
    					
    Tennent1976 Tennent, R.D. The denotational semantics of programming languages 1976 Commun. ACM
    Vol. 19 (8) , pp. 437-453  
    article
    Abstract: This paper is a tutorial introduction to the theory of programming language semantics developed by D. Scott and C. Strachey. The application of the theory to formal language specification is demonstrated and other applications are surveyed. The first language considered, LOOP, is very elementary and its definition merely introduces the notation and methodology of the approach. Then the semantic concepts of environments, stores, and continuations are introduced to model classes of programming language features and the underlying mathematical theory of computation due to Scott is motivated and outlined. Finally, the paper presents a formal definition of the language GEDANKEN.
    BibTeX:
    @article{Tennent1976,
      author = {Tennent, R. D.},
      title = {The denotational semantics of programming languages},
      journal = {Commun. ACM},
      publisher = {ACM},
      year = {1976},
      volume = {19},
      number = {8},
      pages = {437--453},
      url = {www.csc.liv.ac.uk/~grant/Teaching/COMP317/densem.pdf},
      doi = {http://doi.acm.org/10.1145/360303.360308}
    }
    					
    Urban2001 Urban, P.; Defago, X. & Schiper, A. Chasing the FLP Impossibility Result in a LAN or How Robust Can a Fault Tolerant Server Be? 2001 in Proc. 20th IEEE Symp. on Reliable Distributed Systems (SRDS , pp. 190-193   inproceedings
    Abstract: Fault tolerance can be achieved in distributed systems by replication. However, Fischer, Lynch and Paterson have proven an impossibility result about consensus in the asynchronous system model. Similar impossibility results have been established for atomic broadcast and group membership, and should be as such relevant for implementations of a replicated service. However, the practical impact of these impossibility results is unclear. For instance, do they set limits to the robustness of a replicated server exposed to extremely high loads?

    The paper tries to answer this question by describing an experiment conducted in a LAN. It consists of client processes that send requests to a replicated server (three replicas) using an atomic broadcast primitive. The experiment has parameters that allow us to control the load on the hosts and on the network and the timeout value used by our heartbeat failure detection mechanism.

    Our main observation is that the atomic broadcast algorithm never stops delivering messages, not even under arbitrarily high load and very small timeout values (1 ms). The result was surprising to us, as we expected that our atomic broadcast algorithm would stop delivering messages at such small timeout values. So, by trying to illustrate the practical impact of impossibility results, we discovered that we had implemented a very robust replicated service.

    BibTeX:
    @inproceedings{Urban2001,
      author = {Peter Urban and Xavier Defago and Andre Schiper},
      title = {Chasing the FLP Impossibility Result in a LAN or How Robust Can a Fault Tolerant Server Be?},
      booktitle = {in Proc. 20th IEEE Symp. on Reliable Distributed Systems (SRDS},
      publisher = {Online Available: http://lsrwww.epfl.ch/Publications/ById/288.html},
      year = {2001},
      pages = {190--193},
      url = {http://lsrwww.epfl.ch/Publications/ById/288.html},
      doi = {http://dx.doi.org/10.1.1.24.6725}
    }
    					

    Created by JabRef on 25/07/2010.