Connections are everywhere! To communicate between 2 parties, we need connections. We use them in modern web stack and apps: from server-client to server-database interactions. But why don't we hear about it often?
Abstraction! Details on connections are often abstracted away (and for good reason!). However, if you are planning to build your own program with great performance, a firm understanding of what goes under the hood is definitely an advantage!
In this article, I will be starting off with a common problem to motivate the use of connection pooling. …
I was first introduced to Remote Procedural Call (RPC) while working on a software engineering entry task back in December. I soon learned that RPC is an essential component employed in many of my company’s distributed infrastructures. As a new intern, I didn’t fully understand the benefits RPC brings to the table. I was just contented to finish up my entry task and start working on my internship project.
If you are in the same boat as me, let me try to explain what RPC is. RPC is a layer of abstraction that enables easy-to-program communication between a client and…
FLP (Fischer, Lynch, and Paterson) Impossibility is one of the most fundamental results in distributed systems and has been taught in many Computing undergraduate courses. Given the scale of data and computation, distributed systems (scaling services with more machines) are essentially the default pattern used to build scalable system infrastructures today. Distributed consensus becomes crucial as the machines need to agree upon a consistent state to provide a coherent service.
The FLP theorem answers the following fundamental question on consensus:
In an asynchronous distributed system, is there a deterministic consensus algorithm that can satisfy agreement, validity, termination, and fault tolerance?
A very common tradeoff pattern in distributed systems today is presented as follows: to scale up services and achieve high performance, a common strategy used is to shard data horizontally across many servers. Having many servers mean the probability of failures has increased proportionately, and we can no longer avoid failures. The only thing we can do is to make sure that we are well-prepared for them. To achieve this fault tolerance, companies deploy replication as a key strategy to provide high availability of the data.
While replication may seem deceivingly straightforward (just replicating the states in different servers), various…
This article summarises the main lessons learnt from The Google File System (GFS) paper published in 2003 .
Google File System (GFS) is a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance even with inexpensive commodity hardware, and delivers high average performance to a large number of clients. GFS is widely deployed within Google as a storage platform for applications that require the generation and processing of large data sets.
MapReduce is an interface that enables automatic parallelization and distribution of large-scale computation while abstracting over “the messy details of parallelization, fault-tolerance, data distribution, and load balancing” .
As the name suggests, MapReduce is inspired by the map and reduce functions present in many functional languages. These functions allow MapReduce to parallelize large computation easily and use re-execution as a key mechanism to deal with failures.
Before MapReduce, Google has implemented hundreds of computations that process a large amount of raw data to compute various derived data.
They found that most of these computations were conceptually straightforward. However, the difficulty…
Memcached is a well-known, simple, in-memory cache solution and is used by several large companies such as Facebook, Twitter, and Pinterest. The main use case for Memcached is look-aside caching, to reduce the load on the database. Unlike Redis, Memcached, known for its simplicity, does not offer any in-built high availability features.
As the scale of a cache cluster expands, high availability (HA) becomes a critical issue. How do we respond quickly in situations where a cache server crashes temporarily or even permanently? How do we prevent request overload to our Database servers?
While there have been a few open-source…
Memcached is a free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. 
Memcached is known for its simplicity as a distributed cache. Its simple design promotes not only quick deployment, ease of development but also has been reported to be faster than other cache system like Redis in performance comparisons.
Benchmarking or stress testing is crucial for testing the bottlenecks and safe lines in any system. …
At the heart of Swift’s design are two powerful programming paradigms: protocol-oriented programming (POP) and class-based inheritance. POP helps solve some of the problems arising from class-based inheritance such as intrusive inheritance, implicit sharing, and lost-type relationships.
POP also improves model flexibility with new features such as retroactive modeling using Protocol Extensions. Furthermore, as Swift doesn’t offer multiple inheritances of classes, POP can help achieve that by making classes/structs conform to multiple protocols instead.
In the following sections, we will examine: