Blockchain Technology Comprehension. A Survey On Distributed System...

Introduction

Since the first time I have heard a word from the features of the blockchain technology, I totally fell in love with it. The ability to represent an asset from our everyday life, the ability to track every transaction made with the after-mentioned asset onto a public, immutable and transparent distributed ledger simply blew my mind off. My main interest lays in governance a transparent record onto which public services could be leveraged. Therefore enhancing the efficiency of those services and creating a corrupt resistance tool for community control of the flow of resources through the coded algorithm of social governance.

I will try to depict the best I can the subtle details that the concepts used to create the blockchain technology has. My final goal is to develop my own blockchain (surely I am not the only one with that in mind) fitting public budget management use cases. This might happen within the span of three or more years, but I think it totally worths it. In the meanwhile, I intend to record my progress onto this public blockchain with the hope of having some feedback from experienced developers, creating opportunities for constructive debates and flattening the path for those who might come after me in the same journey.

^{Licensed under Creative Commons Attribution-Share Alike 4.0 International license. Source}

Understanding the basics. Distributed Systems

Distributed network

^{Unstructured peer-to-peer-network diagram. Licenced Under Creative Commons CCO Source}

A distributed system is a set of nodes where two or more of them work co-ordinately in order to achieve a common outcome when carrying out a certain task. Each node is capable of having autonomous behavior, i. e. each node has its own memory and processor, and ideally, each one could fail individually without affecting the global performance of the network. Also, nodes function in a way that allows final users to perceive them as a single logical unit.

For example, we perceive Ethereum network as an entity onto which Smart Contracts are coded and executed harmoniously, but it is the sum of the individual work of each node what makes possible for Smart Contracts to operate.

Distributed Systems have the following features:

Components within distributed systems have resource sharing, both hardware and software are commonly employed to fill the needs of the network's tasks.
Any distributed system allows horizontal-scaling. Instead of upgrading the system by investing in hardware with greater capabilities, hosters can just add another computer to improve the network's performance.

Main drawbacks of distributed systems

^{A fault tolerant system. Image Licenced under CC-BY-SA. Source}

The main challenge of running a distributed system is synchronization and fault tolerance. For example, it might be the case that nodes within the network are unable to send and receive messages to and from each other. Thus, nodes can't update their state machines to keep updated with the global state of the network. Even worse, nodes can behave erratically due to malfunctioning or malicious intended behavior. Be that as it may, the system must tolerate these faults and must continue to work flawlessly.

The CAP theorem

The last statement should be further clarified, what does it mean that a network should keep functioning in spite of several nodes having faulty or malicious behavior? The CAP theorem provides the basic foundation for the understanding of distributed systems' operation mechanism. But before reviewing the CAP theorem, some basic concepts must be taken into account:

Consistency:
We say that a distributed network has consistency if it always gives back a response fitting the expected service specifications.

Availability:
If each not-faulty node within the network eventually gives an answer, we state that the network has availability. Nevertheless, in real systems, a response delayed for too long is equivalent to the situation where the network does not respond at all.

Partition-tolerance:
The partition Tolerance is a characteristic inherent to the system itself, it is intended to address the situation where communication between elements of a network (without including users) is not reliable. For example, when there are two nodes geographically apart and they are unable to communicate with each other to synchronize the shared state, we say that the system is partitioned. If such a system is able to keep functioning flawlessly in that scenario, it is said that the system is partition tolerant.

^{Different properties that a distributed system can guarantee at the same time. Image author Benjamin Erb, licenced under (CC-BY-SA) Source}

The CAP theorem states that in a network where communication failures occur (i. e. the system is partitioned) it is impossible to have both consistency and availability at the same time. Let's consider a system of two nodes running a distributed database service with reading/write functions allowed for users. If while receiving users requests the connection of the nodes dies, there is no way for them to know what the other node is doing. In this situation, if they attempt to respond to every request, they could start working with stale information (loss of consistency). But if a node stops responding requests in order to avoid giving a wrong response, then the network loses availability.

How Cap theorem's limitations are handled?

Software designers seem to have only two options, either sacrifice consistency to have a highly available system or forfeit availability to get high levels of consistency... we all have seen that they know much better, real-life distributed systems usually are made of many layers which can be though as individual systems with different trade-off between consistency and availability. Each part prioritizes consistency or availability according to the system needs. Let's dig a bit further into this.

Consistency trade for Availability

^{Image released into Public Domain. Source}

An example of a partitioned system which had traded consistency in order to get high levels of availability is internet navigation. Web pages content is cached to servers around the world when web pages administrators update their content it takes some time for all servers to update their register, thus some users may get slightly out-of-date content when browsing a web page. Nevertheless, the service still fits almost all its purposes. But if there are long periods of delay when loading a page (low availability of the service) users end having an unpleasant experience while using internet services.

Availability trade for Consistency

This approach is taken when there is usually a reliable network communication, data servers located in one data center are good examples of this situation. When prioritizing consistency a slave-master structure is often chosen to design the network architecture. Therefore, we would have a master node that will synch with shared replicas in order to update the global state when receiving a request. The network will first update the shared state before giving responses to the subsequent request, thus high levels of consistency are achieved.

Gluing pieces with different trade-offs

As stated earlier, some system designers choose to create a network adding up pieces with different trade-offs, some guarantee high levels of consistency whether others have high levels of availability. The system functions with several partitions, here we will summarize some of the most common ones:

Data partitioning: In some cases, it is acceptable that some data update is delayed, for example, stock disponibility in an e-commerce service. But in other cases data must be always consistent, the clearest example should be the balance of a bank account or the bills payout.

Operation partitioning: An example of operation partitioning happens when on a database read-only operation are prioritized out the bulk traffic to enhance the levels of availability while writing/append operations could stop functioning when the network experiments poor inner communication.

Functional Partitioning: This strategy is adopted when a service is subdivided into several sub-services, hence each sub-service will have its particular trade-off best fitting its requirement.

User Partitioning: Designers tend to choose this approach when a service is provided across great extensions of land or when the network should bear massive workload as in social media services. Users from different spots relay on servers located close to them, this guarantees high availability and at the same consistency is readily achieved if the communication between the servers is highly reliable.

Final thoughts

It is important to have a clear comprehension of distributed systems properties. When designing one such system the use case for what it is intended will determine the convenience for a particular level of trade-off between consistency and availability in each component of the distributed system.

In the next post we will consider the more general and theoretical cases, we will see that consistency-availability trade-off is a particular case of the more general trade-off between safety and liveness in an unreliable system. Also, we will further elaborate on the details of the distributed systems characteristics.

References

Seth Gilbert and Nancy Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SigAct News, June 2002.
Seth Gilbert and Nancy Lynch. Perspectives on the CAP Theorem. IEEE Computer Society Press, February 2012.

Blockchain Technology Comprehension. A Survey On Distributed Systems -Part 1