Consistent Backends and UX: How Do New Algorithms Help?
Article series
- Why do you need to pay attention?
- What problems may occur?
- What are the barriers to adoption?
- How does the new algorithm help?
Previous articles explain what is data consistency, the difference between "strong consistency" and "ultimate consistency" and why this distinction is more important to modern application developers than ever before. We also introduce the concept of “consistency tax”: if the development team chooses a system that only has final consistency or limited consistency guarantees, it requires additional time and effort.
Many modern databases use state-of-the-art algorithms to eliminate the trade-off between consistency and performance. Of course, we do not want you to trust our statement without proper explanation. Therefore, in the last part of this article, we will dig into the technical details behind some of these databases. Typically, the only source of information for these technical details is a research paper, so the purpose of this article is to explain these systems in simpler terms. Since these systems are much more complex in reality, if you want to learn more and enjoy reading research papers, we will provide links in the text.
Introduction
In Parts 1 and 2 of this series, we explain how distributed databases use different replica distribution loads and/or serve users in different regions. To sum up, for new readers, a copy is just a copy of the data. This replica can be located in the same location for redundancy or in another location to provide lower latency to users of these locations. Having multiple replicas that can handle read and write operations has a powerful advantage because the database becomes scalable and can provide lower latency to all users, no matter where they are. However, you don't want each replica to have its own interpretation of the data. You need a unique interpretation of the data, which is often referred to as a single source of facts, rather than a subtle data difference between each replica. To achieve this, you need to reach some consensus on data changes. We need consensus.
Wait for consensus
Each distributed database designed to maintain consistency has multiple replicas that must agree on the outcome of the transaction. If a conflicting data update occurs, these replicas must agree on which update passes and which update does not. This is called "consensus."
Let's go back to our game to illustrate why we need consensus. Suppose our gamers have only 3 gold coins left, but are trying to buy two different items from two different stores at the same time, with a total budget exceeding the remaining 3 gold coins. This involves two transactions, each corresponding to a project/shop, which we represent as t1 and t2. Let's assume that the owners of the store are thousands of miles apart from each other, so the transaction happens on two different copies. If two transactions are accepted, the user will be able to purchase items that exceed their ability to bear. How do we prevent users from overspending?
We know that these replicas need to communicate to agree on the final outcome of these two transactions. What we don't know is how much communication they need. In order to agree on which transaction is given priority and which transaction is cancelled, how many messages need to be sent back and forth between replica 1 and replica 2?
Because replicas in distributed databases are designed to serve users in different parts of the world with low latency, they are essentially far apart. By placing copies of the data on servers closer to the end user, these users can read with lower latency. However, when a write operation occurs, the replicas need to send messages to each other to uniformly update all duplicate data—as these messages are limited by the speed of light when propagating globally, they can take tens of milliseconds. It is obvious that we need to minimize the number of messages across data centers so that end users don’t wait for consensus on these replicas around the world.
It has long been believed that doing so is impossible or impractical. But nowadays, there are a variety of techniques that can keep round trips at low levels and keep delays within normal range.
The distance between New York and Paris is 5839 km. It takes 40 milliseconds to travel from New York to Paris and then return.
— Theoretical speed and real world speed
The most important question left is: "How many transaction round trips do we need to perform?" The answer to this question depends largely on the algorithm used.
How to reach an agreement?
It seems that in order to reach a consensus on something, you need at least four hops (or two rounds of communication): one round letting each replica know that you are about to perform something, and then the second round actually performs the action after everyone agrees that you can do it. This is a method called distributed two-stage submission , which is used by almost all distributed databases. Let's look at a metaphor. Suppose you have to agree with a group of people on the right date for the party. This might look like this:
First, Polly asked everyone if they could party on Monday; she now knows that everyone can actually come to the party. Next, she needs to let everyone know that the party will indeed be held on Monday and that people admit they will come.
These are very similar to the two stages in the two-stage submission. Of course, the database won't host parties, so these stages have different functions. In the case of a distributed system, these stages are called:
- Prepare or request submission : Make sure everyone is aware of the transaction. At this stage, replicas in the distributed database will query in some kind of to-do list (transaction log) stored on disk to ensure that they still know what to do when the server goes down.
- Submit : Actual calculation results and store them
Of course, as always, things are not that simple. There are many variations of such algorithms. For example, there are two-stage submissions that are improved, called Paxos and Raft, and there are even many of these variants (multi-Paxos/fast Paxos/…). These alternatives are designed to address availability or performance issues. To understand usability issues, just imagine Polly getting sick or Amber’s phone is out of power. In the former case, she will not be able to continue as a party coordinator; in the latter case, Polly will not be able to know whether Amber agrees to the party date. Raft and Paxos improve this by asking only most people to answer and/or automatically selecting new coordinators when a leader or coordinator goes down. Here you can find a good animation showing how Raft works.
To reach an agreement on what?
Can we conclude that each distributed database requires 2 round trips to read and write data? No, reality is more complicated than this. On the one hand, there are many possible optimizations, and on the other hand, we may need to agree on multiple things.
- Agree on the time of the matter
- Agree on whether the read operation can be performed
The easiest example with multiple two-stage commit rounds is probably a lightweight transaction of Cassandra. They first need to reach a consensus agreement on reads, and then they need to reach a consensus on writes. If each message takes 40 milliseconds to propagate, this means the entire transaction takes 320 milliseconds or more – depending on the required "lock", which we will explain later.
This is easy to understand, but there are some problems with implementation, as Cassandra has never been designed to be strong consistency. Does this mean that strong consistency databases are even slower? not at all! Modern distributed databases use a variety of interesting features to achieve better performance.
Wait for lock
Not only do we need to wait for the message to agree, but almost every distributed database will also use "locks". The lock ensures that the data that the transaction is about to change will not be changed simultaneously by another transaction. When the data is locked, other transactions cannot change it, which means that these transactions have to wait. Therefore, the duration of such locks has a great impact on performance. Again, this performance impact depends on the algorithm and optimization implemented by the database. Some databases hold longer locks than others, while some databases do not use locks at all.
Now that we have learned enough basic knowledge, we can get a deeper understanding of the algorithm.
Modern consensus algorithm
We now know that consensus and locks are the main bottlenecks we need to optimize. So, let's return to the main question of this article: "How can new technologies control these delays to an acceptable range?" Let's start with the first of these modern algorithms, which inspired some interesting ideas in other areas of the database world.
2010 – Percolator
Percolator is an internal system based on BigTable, one of the early NoSQL databases built by Google, which Google uses to incrementally update the page crawling speed of search indexes. The first paper about Percolator was published in 2010, and it inspired the first distributed database inspired by it: FoundationDB in 2013. FoundationDB was then acquired by Apple, and eventually released a stable version in 2019, as well as the FoundationDB paper.
Although Percolator allows Google to significantly speed up page crawling, it was not originally built as a general-purpose database. It is more designed to be a fast and scalable incremental processing engine to support Google's search indexing. Since the search index must be scalable, many computations must be performed simultaneously on many machines, which requires a distributed database. As we learned in previous articles, programming for distributed systems that store data can be complex and traditionally requires developers to pay a "consistency tax" to deal with unpredictable database behavior. To avoid paying such high consistency tax, Google adopted a strong consistency model when building Percolator.
Percolator's consistency model cannot exist without two key elements: version control and timestamp oracle
Element 1: Version control
As we mentioned in our previous article, strong consistency requires us to agree on the global order of transactions. Versioning is one of the key elements of many such algorithms because it can be used for failure recovery, helping to replicate data, and supporting a consistency model called "snapshot isolation."
Version control helps with failure recovery when a node fails or is disconnected. When a node is back online, due to the existence of a version, it can easily restore its state by starting from the last snapshot that can be saved and then replaying the transaction based on the version in another node. All it has to do is ask another node: "Hey, what has changed since I'm not here?" Without version control, it will have to copy all the data, which will put a lot of pressure on the system.
Failure recovery is great, but the biggest advantage is that this type of version control system can be used to implement strong consistency models. If the version control system keeps the version of each data change, we can actually go back in the past and perform queries against earlier versions of our data.
Some clever people have found that this historical query feature can be used to provide a consistency model called "snapshot consistency." The idea of snapshot consistency is to select a version of the data at the beginning of the query, use the number of that version for the rest of the query, and then write the new version at the end of the query.
There is a possible pitfall here: During execution of such a query, another query may write data that conflicts with the first query. For example, if both write queries start with the same snapshot of a bank account ($1,000 on the account), they can both cost the money because they don't see the writes of another query. To prevent this, an additional transaction is performed to see if the value of the snapshot has changed before any query is written to the result. If a conflict does occur that causes a change in the snapshot value, the transaction will be rolled back and must be restarted.
However, Percolator still needs to solve one problem. The clocks on different machines can easily vary by hundreds of milliseconds. If the query's data is distributed across multiple machines (as in our initial example), you cannot simply require both machines to provide data with some time stamp, because they have slightly different understandings of the current time. This is a millisecond problem, but when many transactions have to be processed, a few milliseconds are enough to go from the right data to the wrong data.
Time synchronization reminds us of the second element of Percolator.
Element 2: Timestamp oracle
The solution for Percolator to solve the time synchronization problem is called a timestamp oracle. Instead of letting each node decide the time itself (which is not accurate enough), Percolator uses a central system expose API to provide the timestamp. The node where this system is located is a timestamp oracle. When we keep multiple versions of the data, each query requires at least two timestamps. First, we need a timestamp to query the snapshot, which we will use to read the data. Then, when we are ready to write, at the end of the transaction, we need a second timestamp to mark the new data version. Therefore, the disadvantage of Percolator is that it requires at least two calls to the timestamp oracle, which introduces more latency if it is located in a different area than the calling source node. When Google proposed their distributed database Spanner, they solved the problem.
2012 – Spanner
Spanner is the first global distributed database to provide strong consistency, which basically means you can get low latency reads without worrying about potential database errors anymore. Developers no longer need extra work to circumvent potential errors caused by final consistency. The paper was published in 2012 and was published publicly in 2017 as Spanner Cloud.
Element 1: Version control
Google built Spanner after experience using Percolator. Because Percolator's version control system proved to be effective, they retained this in Spanner's design. This version control system provides the ability to perform very fast reads (snapshot reads) when you are willing to give up consistency. In this case, you can run the query and provide the maximum age for the results to Spanner. For example: "Please return my current inventory as soon as possible, but the data can only be from 15 seconds ago." Basically, you can choose the level of consistency that suits your use case for each query, rather than abandoning consistency.
Element 2: TrueTime
To eliminate the extra overhead of time between synchronized machines, Spanner abandoned the timestamp oracle in favor of a new concept called TrueTime. Instead of using a central system to provide a unified view of time, TrueTime attempts to reduce clock drift on the machine itself. Google engineers limit local clock drift by implementing a time synchronization protocol based on GPS and atomic clocks. This synchronization algorithm allows them to limit clock drift to less than 7 milliseconds, but this requires specific hardware from a combination of GPS and atomic clock technology.
Of course, there is still a potential clock drift of 7 milliseconds, meaning that both servers can still interpret the timestamp as two different snapshots. This is solved by Spanner's third element: Submit waiting.
Element 3: Submit waiting
In fact, the TrueTime API does not return a timestamp, but an interval n, which determines that the current timestamp should be within that interval. Once you are ready to submit, it only needs to wait a few milliseconds to deal with potential drift, which is called "commit waiting". This ensures that the timestamp assigned to the write is the timestamp that has been passed on all nodes. This is also why running Spanner on commodity hardware does not provide the same guarantee, as the waiting time takes hundreds of milliseconds.
2012 – Calvin
The first paper on Calvin algorithm was published in 2012, from Yale University's research. Like the previous solution, Calvin also contains several elements. Although versioning is part of it, the rest of the methods are quite different, which requires some extra elements to work: deterministic calculations and separating sorting from locking. These are often not found in databases with traditional architectures. By changing the schema and accepting queries must be deterministic, Calvin can reduce the worst-case number of messages across data centers to two . This greatly reduces the worst-case latency of global transactions and reduces it to below 200 milliseconds, or theoretically even below 100 milliseconds. Of course, to believe this is possible, you might first want to know how it works, so let's take a look at the algorithm.
Element 1: Version control
Similar to Percolator and Spanner, Calvin also relies on versioned data. These snapshots in Calvin are mainly used to ensure fault tolerance. Each node stores different snapshots, which can be considered checkpoints. After the disconnected node is back online, simply get the timestamp of the last checkpoint it has witnessed and ask another node to notify it of all transactions after that checkpoint.
Element 2: Deterministic calculation
Many front-end developers have heard of the Elm front-end framework, which implements a workflow similar to React Redux. Elm has a steeper learning curve than similar JavaScript-based frameworks, because it requires you to learn a new language. However, since the language is functional (without side effects), Elm allows for some impressive optimizations. The key is that the functions in Elm give up destructive operations to become deterministic. You can run the same function twice with the same input and it will always produce the same result. Since they are deterministic, Elm queries now make more efficient decisions about how to update views.
Similar to Elm, Calvin also gave up on something to speed up the calculations. In Calvin's case, we can basically say that the result of the transaction will be the same, whether it is executed on machine A or machine B. This seems obvious, but databases generally don't guarantee this. Remember that SQL allows you to use the current time or allow so-called interactive transactions where user input can be inserted in the middle of a transaction, both of which may violate the guarantees provided by Calvin.
To achieve deterministic calculations, Calvin (1) needs to take out calculations such as current time and pre-calculate them, and (2) does not allow interactive transactions. Interactive transactions are transactions that users start a transaction, read some data, provide some additional user input in the middle, and then eventually perform some additional calculations and some written transactions. Since users are unpredictable, such transactions are not deterministic. Essentially, Calvin trades less convenience (interactive transactions) for outstanding performance.
Separate sorting issues
The database spends a lot of time negotiating locks to make it look like the system is executing in a specific order”. If all you need is order, maybe we can separate the locking problem from the sorting problem. But that means your transactions have to be pure.
— Kyle Kingsbury
Separating transaction sorting issues from actual execution has been considered many times in the database field, but with little success. However, when your transaction is deterministic, it actually becomes feasible to separate the sort from the rest of the algorithm. In fact, the combination of deterministic calculations and separating the sort from the rest of the algorithm is very powerful because it helps to reduce the duration of the lock and greatly reduces slower communications between distant nodes (trans-data center communication).
Shorter lock duration
Whenever a lock is held on a portion of the data, this means that other queries using that data must wait. Therefore, the shorter the duration of the lock, the better the performance. The following figure shows an overview of the locking process in Calvin and how a traditional distributed database might do this. Most databases keep locked on data until at least agree on the write content, while Calvin only keeps locked until all nodes agree on the order. Since the calculations are deterministic and they all agree on order, each node will compute individually and yield the same final result.
Reduce communication between remote nodes
In addition to the advantage of lock duration, separating the sort from the rest of the algorithm requires less communication. As explained earlier using the Cassandra example, distributed databases often require cross-data center communications at many stages of their algorithm. In Calvin's case, the only moment we need to agree on something is the moment in which we determine the order. Using the Raft protocol, this can be done within two hops, which enables delays of less than 100 milliseconds for read and write queries.
Coupled with the reduced lock time, this also brings extremely high throughput. The original Calvin paper also conducted experiments showing that this approach significantly outperforms traditional distributed database designs under high contention workloads. Their results of doing 500,000 transactions per second on a commodity machine cluster compete with the current world record results obtained on high-end hardware.
Run on any hardware
Apart from that, Calvin has another advantage: it no longer requires specific hardware to get such results. Since Calvin can run on commodity machines, it can run on any cloud provider.
2014 – FaunaDB's Consensus Style
Element 1: Version control
FaunaDB has its own distributed transaction protocol, which has some similarities with Calvin. Like the previous method, FaunaDB's data is versioned. Since versioning is not only useful for consistency models, it also has business value, FaunaDB has upgraded its mechanism to first-class citizens, which can be used by end users. This feature actually allows time travel inquiries. The end user can execute a query on historical data to answer the following question: "What was the result of this query 20 days ago?". This is useful for recovering unexpectedly overwritten data, reviewing data changes, or just adding time travel to the app's features.
Elements 2 and 3: Deterministic calculation and separation
Like Calvin, FaunaDB also has deterministic calculations and separates the sorting problem from the rest of the algorithm. Despite similarities, computed transactions occur in FaunaDB at a different stage than Calvin. Calvin uses the deterministic feature to execute the same transaction multiple times after setting the order, while FaunaDB calculates only once before reaching consensus on the order of transactions. This reminds us of the fourth element.
Element 4: Optimistic calculation
FaunaDB has added a fourth element, which we have seen when discussing snapshot isolation: optimistic calculations rather than locking.
FaunaDB does not lock, but rather optimistically calculates the transaction's results once in the node receiving the transaction, and then adds the result and the original input values to the log. Calvin saves queries that need to be executed in the transaction log, while FaunaDB saves both the calculation results and the original input values in the log. Once consensus is reached on the order of application results, FaunaDB will verify that the computed input data has been changed (thanks to version control). If the input value has changed, the transaction will abort and restart; if the input value remains the same, the result will be applied on all nodes without any additional calculations.
FaunaDB's algorithm has similar advantages to Calvin, but reduces the amount of computation required in the cluster.
in conclusion
In this series, we explain how strong consistency can help you build error-free applications more efficiently. In the last part of this article, we further explain how revolutionary ideas can power a new generation of distributed databases that are both consistent and efficient. In previous articles, the key point was: "Consistency is important." At the end of this article, the key points are included in the following:
In the near future, if you read phrases like:
“Many NoSQL databases do not provide atomic writes of multiple documents, but in turn provides better performance. While consistency is another powerful feature of SQL databases, it hinders the ability to scale databases across multiple nodes, so many NoSQL databases abandon consistency.”—The biggest challenge of migrating to NoSQL
Be aware that modern algorithms enable databases to provide consistency without centralization. In this article, we have seen some examples of algorithms and databases that do this. Databases built on these algorithms are new generations of databases that can no longer be described in simple categories such as NoSQL, SQL, or even NewSQL.
Using distributed cloud databases based on Percolator, Spanner, Calvin, and FaunaDB transaction protocols, you can have high-performance distributed databases that provide stronger consistency models. This means you can build data-intensive applications that provide low latency without worrying about data errors, performance, or service configuration. In such a system, consistency is transparent and you don't need to consider it as a developer. Next time you select a database, select a database that remains consistent by default.
Article series
- Why do you need to pay attention?
- What problems may occur?
- What are the barriers to adoption?
- How does the new algorithm help?
The above is the detailed content of Consistent Backends and UX: How Do New Algorithms Help?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

It's out! Congrats to the Vue team for getting it done, I know it was a massive effort and a long time coming. All new docs, as well.

I had someone write in with this very legit question. Lea just blogged about how you can get valid CSS properties themselves from the browser. That's like this.

The other day, I spotted this particularly lovely bit from Corey Ginnivan’s website where a collection of cards stack on top of one another as you scroll.

I'd say "website" fits better than "mobile app" but I like this framing from Max Lynch:

If we need to show documentation to the user directly in the WordPress editor, what is the best way to do it?

There are a number of these desktop apps where the goal is showing your site at different dimensions all at the same time. So you can, for example, be writing

Questions about purple slash areas in Flex layouts When using Flex layouts, you may encounter some confusing phenomena, such as in the developer tools (d...

When the number of elements is not fixed, how to select the first child element of the specified class name through CSS. When processing HTML structure, you often encounter different elements...
