DEVOPS & CLOUD
5 techniques to consider for your distributed edge database
By Nick DelRegno, Fellow, Verizon
In recent years, the rise of virtualization and cloud computing has enabled the amazing growth of online services and applications. While many users of cloud-enabled services think of the cloud as being as ubiquitous and geographically distributed as the internet itself, this is not the case.
Most cloud providers have centralized their compute and storage infrastructure to a select number of massive data centers. While this limited footprint supports the scalability of cloud services and provides geographic redundancy, it comes at the cost of latency between the end user and the cloud-based application.
Using Verizon 5G Edge with Amazon Web Services® (AWS®) Wavelength can help deliver the benefits of cloud computing while more widely distributing the compute resources to reduce the latency associated with cloud services. This reduces the network transmission and routing delays associated with backhauling end-user traffic to a small number of data centers. It also reduces the amount of network bandwidth required to serve those users. For static applications, this is a win-win scenario. At the same time, highly dynamic stateful applications can potentially experience some scaling challenges introduced by large-scale production, distribution, storage and retrieval of data, especially on a real-time, low-latency basis.
This post discusses the impact of such distribution on large-scale database design and explores some emerging solutions that may help enable highly scalable, highly consistent stateful applications at the edge.
Database scaling for cloud applications
Databases are the backbone of most online applications and services. Since their inception, databases — and how to scale them — have been the subject of much research and effort. Still, the explosion of online applications, ranging from online commerce to customer resource management (CRM) to social networking and other use cases, has driven the need to scale databases while fundamentally changing the nature of databases.
While relational databases are still king for numerous applications, many of the latest massive applications and services leverage newer database types such as key-value stores, document stores and graph databases. Regardless of the type of database, they all require massive scale while maintaining consistency, availability and partition tolerance. Research and effort have helped maximize these key requirements, although there is a limit to how much one requirement can be maximized without impacting the other.
The significance of the CAP theorem
In general, the “CAP” theorem says that it is impossible for a distributed datastore to concurrently provide more than two of the three guarantees:
- Consistency: Every read receives the most recent write
- Availability: Every read/write request receives a nonerrored response
- Partition tolerance: The system continues to operate in the presence of communication loss or delay between the distributed nodes
Basically, failures are bound to happen in a distributed system, requiring the system design to choose between consistency and availability. Consistency failures result in reads of possibly stale information. Availability failures result in the inability to read data at all, at least temporarily. The impact of these failures depends on the application, so there is no one-size-fits-all approach to scaling of databases.
While the CAP theorem is often applied to database design where nodes are geographically distributed, it applies equally to scaling techniques within a single data center. This post specifically explores geographic distribution.
Database scaling techniques
This section explores the common techniques used to scale databases and the impact of edge distribution.
- Vertical scaling
Vertical scaling is the addition of hardware resources to a given database instance. For example, as CPU, storage and memory become better and cheaper, the hardware can be upgraded to enable scaling of the database instance. This scaling is limited by hardware maximums and provides no additional redundancy. Generally, edge compute has no impact on the ability to vertically scale a database, other than perhaps any limitations on power, space or heat dissipation required by the newer hardware.
2. Horizontal scaling
Horizontal scaling is the addition of more machines into a pool of resources and making them act as a single unit. This is helpful when discrete functions in a database service are separable and allowed to vertically scale independently of other functions. Each separable function can run on its own hardware, which can then scale vertically as well, depending on the load requirements. This adds complexity to the database service interprocess communication, as well as general cluster management and fault tolerance. Horizontal scaling can be performed in edge compute without inherent issues.
Another way to scale horizontally is to maintain multiple copies of the same database while load balancing the I/O transactions among them. Assuming the database is contained within a single 5G Edge location, edge compute adds no additional complexity to horizontal scaling.
3. Database partitioning
A form of horizontal scaling, database partitioning enables the datastore itself to span multiple machines. One approach is to partition the data based on units of tables or columns of tables. All of the data in a given table is contained within a single machine. Each machine can contain many tables, but no table spans more than one machine. When the table is too large for a given instance, it can be broken into groups of columns and distributed across multiple machines as independent tables consisting of columns of the original table. This is known as vertical partitioning.
Another common approach is partitioning data horizontally, where the same tables exist across multiple instances, but different groups of rows are contained in each instance. For example, in a customer database, rows with a last name between “A” and “M” could be stored on one machine, while rows with a last name between “N” and “Z” are stored on another. Either approach is considered “sharding,” but the horizontal partitioning is typically synonymous with sharding.
Sharding adds complexity to reads and writes, as either:
- The application must understand the shard key and make reads and writes to the correct server
- Or a centralized sharding function converts generic reads and writes into reads and writes to appropriate shards based on the shard key
Sharding complicates indexing and introduces a single point of failure: If a shard crashes, the whole table crashes. Sharding also brings complexity influenced by the CAP theorem since the data is now distributed with potential communication faults between and among them.
Since sharding is typically done within a data center, or within a single database cluster, edge compute adds no inherent burdens on sharding. If sharding is done across data centers, having a large number of shards due to the expanded footprint of 5G Edge compute exacerbates some of the complexities above.
4. Replica sets
Read replicas are a commonly used approach to scale databases and are especially beneficial for applications that experience far more reads than writes. This approach involves adding servers that hold replicas, or copies, of the data for read purposes. All writes still must go to a centralized primary write server, but then all changes are synchronized with the read replicas.
As read demand grows, additional replicas can be added and distributed to service the load. Further, since all of the replicas contain an exact copy of the data on the primary, when the primary fails, a replica can be promoted to primary.
However, replicas can introduce problems and complexity. The more read replicas there are, the more work there is for the primary to ensure replication to all of the secondaries. In a small set of replicas, this is trivial and can be done with direct replication between the primary and its replicas. As the number of replicas increases, this can become a scaling limitation on the primary. To acknowledge a write request, the primary must wait for the replicas to confirm the update. This can be expedited by majority vote mechanisms where the primary waits for the majority to acknowledge the update; however, as the number of replicas required to form the majority increases, the time it takes for that majority to be achieved increases. This is naturally exacerbated by any latency between the replicas and the primary.
Distributed replicas can also introduce delays between writes and reads of a given piece of data. As the replicas are typically distributed, a write in Los Angeles takes time to be replicated to New York. As such, a read replica in New York can be serving stale data until the update is received. From speed-of-light delays alone, this could be happening for tens of milliseconds.
Another issue emerges when partitioning replicas and restoring a replica once the partition is resolved. Approaches range from sending transaction logs from the primary to enable the replica to catch up, to completely rebuilding the replica as if it were new to the replica set.
As mentioned earlier, database replication can be greatly impacted by edge distribution. As the number of replicas increases, the demand on the replication mechanism increases. Further, as replicas are pushed closer to the edge, care must be taken to limit the candidates for promotion to primary. Since most replication approaches require a single primary write server, it is important that the replica that is elevated to primary not be in geographically suboptimal locations and/or be subject to limited network volume or write volume scaling.
Another consideration when exploring distributing replicas to the edge is that of data duplication and associated costs. Most generic replication results in full copies of the database at all replica locations. With a handful of locations, this is not only less of an issue, it is desired.
As replication expands, one must consider the impact of having complete copies of large databases in relatively small locations. The benefits of the significant reduction in read latencies could be negated by the cost of holding large amounts of likely irrelevant data at the edge. Geographic sharding of the data could be beneficial, with the added complexity of reads spanning multiple shards that are geographically distributed, as well as the increased likelihood of network partitions isolating shards.
5. Replicas and sharding
Sharding of replicas can be beneficial as well. As mentioned above, many replication solutions result in the full database existing at all replica locations. As such, many of the scaling techniques above apply to replicas as well, including vertical scaling and sharding.
But what about the writes?
The scaling approaches above mostly improve performance and scalability for read operations. While write operations can be scaled using some of the vertical, horizontal and partitioning approaches above, they can’t readily benefit from widespread geographic replication.
Most distributed databases have a primary write server that distributes the data to the read replicas. As such, having read replicas at the edge, as close to the customer as possible, only improves read performance. Writes still must go to the primary write server, which could be quite far away, especially in national-, continental- or global-scale applications. For read-heavy applications, this is fine. However, many emerging applications require very low-latency reads and writes. Multiprimary writes can be supported in databases; however, doing so introduces issues of consistency, integrity and performance.
In a single primary scenario, to which all writes must go, consistency is comparatively simple. When the write request is received, the database is locked, the values are created or updated, the lock is removed, and the request is acknowledged. In a distributed environment, this cannot be achieved feasibly. In theory, each write would lock the global database to ensure only a single write at a time.
In practice, this can be done when the number of primaries is exceedingly small, but even then, issues can arise. For example, as the primaries are not collocated, conflicting locks can occur, with a resolution algorithm required. Further, primary-to-secondary replication becomes challenging, especially in the event of primary failure and promotion of a secondary. It also would be highly inefficient to lock a global database for each write, even if it were achievable.
The alternative is to move to more loosely consistent approaches, where consistency is potentially sacrificed and conflict resolution becomes critical. Initial approaches leveraged clock synchronization and timestamping of the transactions to try to ensure that geographically diverse writes occurring almost at the same time can be distinguished by their timestamp and resolved accordingly.
In theory, if all participants have the same clock, then this conflict resolution works for all cases except for those that truly occur at the same time, which should be vanishingly rare. Still, these rare cases have to be accounted for because they would result in acknowledgement of the conflicting writes to the independent user. Further, maintaining and synchronizing atomic clocks to guarantee consistency is infeasible for most applications.
Other approaches leverage vector clocks to provide a causally consistent order of events among distributed entities. Every replication message among the participants contains the sending server’s scalar clock and its knowledge of the current scalar-clock value of all other participants in a vector with the data to be replicated. This ensures that all participants will eventually arrive at an agreement on what was written in what order. However, as the number of participants increases, the size of the vector increases. The burden on the conflict resolution algorithm also increases, since it has to rationalize the causal order among an increasing N-number of clocks.
For now, application designers must consider the database needs relative to the advantages edge compute can bring. Many are opting for the benefits of centralized writes with distributed reads. Others continue to look for new ways to achieve both at the edge.
Distributed writes with CRDTs
Recently, there has been significant focus on data types that don’t require strict causal ordering to achieve consensus. These data types are great candidates for multiprimary models. The data types are known as conflict-free replicated data types (CRDTs). CRDTs are a unique set of data types that, due to their definition, essentially cannot conflict, so an optimistic replication approach can be used.
For example, commutative operations will always result in the same answer, regardless of the order of the operations. Addition of positive numbers, addition of negative numbers and multiplication are commutative. It doesn’t matter the order in which the operations occur; the cumulative result of the operations will always be the same, irrespective of the order.
2 + 3 + 4 = 2 + 4 + 3, or 2 + (-2) + 4 = -2 + 4 +2, or 2 x 3 x 4 = 3 x 2 x 4
However subtraction is not commutative: 2–3 ≠ 3–2.
Neither is division: 2 / 3 ≠ 3 / 2
Data types can be defined accordingly to ensure they adhere to these behaviors. Examples of such data types include grow-only counters, positive-negative counters, grow-only sets, two-phase sets and so on. These are examples of operation-based CRDTs. There are state-based CRDTs as well. There are issues with these, not the least of which relates how to address data types that don’t fit into the above.
CRDTs in a 5G world
Ultimately, not all applications are conducive to CRDT-based database systems. An e-commerce app may want to localize state for a customer’s shopping cart or a cloud gaming app may hope to capture real-time performance stats at the edge. There is no shortage of mobile, immersive experiences requiring these application characteristics.
Simply put, even with the advent of CRDTs, the opportunity to create highly performant databases with distributed writes across a broad geography — beyond the availability zones (AZs) within a region (e.g., Aurora multimaster) — remains ever-present. Moreover, in a 5G Edge world, by extending the virtual private cloud (VPC) to more AZs than ever before, developers will have a unique opportunity to explore these opportunities by leveraging a new array of distributed database technologies across a multitude of geographies.
Could such testing have been conducted in a 4G world? Given that the control-plane latency of 4G associated with the air interface was roughly tantamount to that across the domestic cloud regions (i.e., us-west-2 to us-east-2), there would have been limited value. In a 4G world, writing to a single master, in most cases, was good enough given the underlying mobile latency budget.
However, in a 5G world — one with significantly lower latency associated with the air interface — packets can suddenly reach the radio network, and in turn, the 5G Edge endpoints, at unprecedented speeds. With a distributed footprint of 5G Edge nodes, developers now have an arsenal of deployment strategies at their fingertips. As an example, by leveraging multi-master replication across multiple edge zones, developers can bring data closer to their end users without materially compromising availability or consistency.
What does this all mean?
Much like the content distribution networks (CDNs) of today, edge compute is a great fit for stateless applications and stateful applications that rely on local, monolithic databases. This will cover many applications that can leverage the low latency the 5G edge provides.
At the same time, applications that need access to real-time locally and nonlocally generated data require careful planning to ensure that the distribution provided by the 5G edge doesn’t impact the overall performance of the entire wide-scale application. Further, application developers should understand the nature of their transactions to determine if edge distribution achieves the latency gains they expect for write-heavy applications.
With no shortage of exciting deployment strategies, we’re excited to see the innovation in distributed computing in parallel with the growth of our 5G footprint and invite you to harness the benefits of 5G Edge today.
- Chris Sachs, “We’re Addicted to Databases: They Won’t Help With Edge Computing,” SDxCentral, Aug 30, 2018. https://www.sdxcentral.com/articles/contributed/were-addicted-to-databases-they-wont-help-with-edge-computing/2018/08/
- Alex DeBrie, “Why the PIE theorem is more relevant than the CAP theorem,” alexdebrie.com, Jan 22, 2019. https://www.alexdebrie.com/posts/choosing-a-database-with-pie/
- Chetan Venkatesh and Durga Gokina, “Edge Computing is a Distributed Data Problem,” State of the Edge blog, Dec 10, 2018. https://stateoftheedge.com/blog/2018/12/10/edge-computing-is-a-distributed-data-problem/
- Todd Hoff, “Datanet: A new CRDT Database That Lets You Do Bad Bad Things to Distributed Data,” High Scalability blog, October 17, 2016. http://highscalability.com/blog/2016/10/17/datanet-a-new-crdt-database-that-lets-you-do-bad-bad-things.html