Why latency matters in a 5G Edge world

End-to-end delay and network latency in a nutshell

5 min readOct 13, 2020

October 13, 2020| Robert Belson — Corporate Strategy at Verizon

Throughout the course of history, application developers and network architects sought to optimize application latency to the nth degree via clever quality-of-experience (QoE) optimizations rather than altering the underlying network connection. We have seen this occur especially with static content via content delivery networks (CDNs).

But today, it is possible to reduce latency in the underlying network. This capability can result in greenfield application architectures but may also raise a series of questions. In this post, we hope to create a dialogue around these questions as we build tools for developers to harness the powerful benefits of deploying real-world distributed applications at the network edge.

Application developers’ greatest test

Any front-end developer, back-end engineer or product manager is familiar with the continuous challenges associated with meeting customers’ application demands. A response time that was acceptable to a customer this year can result in churn next year as competing applications continue to advance the arms race. And as customers demand increasingly immersive, highly reliable and low-latency experiences, application developers will be put to the greatest test yet — rearchitecting existing applications to meet emerging and future customer needs.

This is no easy task. To start, let’s explore some decisions we often face as DevOps engineers and application architects:

An immersive media application requires near real-time response times in a mobile environment. Do you migrate larger compute components to the device (with resulting power and performance limitations) or keep them in the cloud at the expense of a potentially lower QoE?
A manufacturing and logistics application requires real-time tracking and analytics to optimize its supply chain and operations. Do you leverage inelastic compute resources on premises or migrate workloads to the cloud at the expense of network latency?
An e-commerce application requires low transaction latency to optimize conversion rates. In the event your application requires a significant volume of database write operations, do you rely on a single write node with strong consistency or consider leveraging geographically distributed write-replicas at the expense of eventual consistency?

These decisions represent a juggling act of multiple variables: application performance requirements, user experience, cost, geographic distribution, consistency models — the list goes on. Up to this point, workarounds, including 4G LTE and home broadband networks coupled with the expansive footprint of a CDN, have been sufficient. But, as discussed above, we know user expectations are never static.

As users demand increasingly immersive experiences with minimal page load time, buffering and transaction latency, will the existing model of “good enough” ultimately be good enough?

A practical example

Let’s say I’m developing an application to be primarily consumed by an audience in Atlanta, Georgia. To deploy the application on Amazon Web Services (AWS), for example, I can optimize my application latency by selecting one (or both) of the following regions: US-East-2 (Ohio) and/or US-East-1 (Northern Virginia). Consequently, my network traffic flow has two options: a route roughly 600 miles to Northern Virginia or one roughly 600 miles to Ohio. But from a networking perspective, is 600 miles significant?

Physics would suggest that the end-to-end (E2E) latency of either route can be derived from the speed-of-light (SoL) — resulting in approximately 3 ms. In reality, however, the end-to-end latency for me will range much higher. To what can we attribute the remainder of the journey?

Blast from the past

In 1974, network architects Vint Cerf and Bob Kahn explored the ways in which a set of distributed nodes could leverage packet switching for resource sharing. The result of these efforts was the TCP/IP protocol, which we cherish today. However, this protocol suite, and the “internet” more generally, was conceived as a “best-effort service,” one that guarantees no consistent quality of service (QoS) level. But why was this the case?

Beyond physical distance, the vast majority of a packet’s end-to-end journey can be characterized by four key areas:

Processing delay: While often on the order of microseconds, this consists of bit-level error checking, packet header examination, and routing logic to determine the packet’s next hop
Queuing delay: As a result of routing convergence, congestion can occur in a particular router processing an unsustainable influx of packets. Because a router’s buffer has a finite capacity, a packet’s queueing delay could be anywhere from zero to a number of milliseconds — the upper limit which results in the packet being dropped and the client having to resend
Transmission delay: Measured by the time required to transmit all of the packets onto the link, the transmission delay is directly contingent on the underlying link’s throughput
Propagation delay: Separate from the packets being pushed into the link, the time required to propagate the packets to the end of the link is determined by the underlying physical medium. This is where the 3 ms in the example above comes in

Thus, while traffic engineering protocols (i.e., MPLS) and routing improvements were introduced in the past few decades to mitigate latency and create more predictable paths, there are no guarantees for low latency — even in the fastest networks.

Network latency in a nutshell

In almost 50 years of networking innovation, the underlying concept of latency remains staggeringly similar. As a packet traverses the internet, it hops across a variety of routers and switches. At each router or switch (think of a traffic junction), packets are either forwarded to their subsequent hop or buffered as a result of network congestion. Said differently, any given hop could result in unbounded network delays. The higher the number of network hops, the greater the likelihood of increased E2E latency. Therefore, latency today can often be mitigated by reducing the number of network hops as a packet traverses.[1]

Unfortunately, as a developer, I have no way to control the realized latency of the network — after all, the network was conceived as a best-effort service.

Enter mobile edge computing (MEC).

The promise of MEC

Through our deep dive on characterizing the end-to-end delay in networks, it is clear that MEC will fundamentally alter what it means to deploy to the network edge. By moving compute resources closer to the end user than ever before, MEC drives us further toward a world in which the propagation delay equals the total delay. As you continue to develop cloud-native applications in the years to come, always consider the following friendly reminders:

Simply put, distance always matters. Given that the distance between source and destination directly affects the propagation delay, and by extension the E2E transport latency, minimizing the distance from end users to compute resources can challenge application developers. With MEC, edge zones will offer a new geographic horizon of growth and increased level of flexibility, elasticity and scalability.

Don’t forget about the “5G” in 5G Edge

Even after enabling Wavelength Zones and configuring your compute environment in the newly created subnet, your application can only perform as well as the underlying network. With 5G and 5G Edge together, you can confidently scale your infrastructure to provide end users with the highest quality of experience across multiple dimensions of performance, including latency, bandwidth, reliability, security and device density.

[1]The number of hops can influence the E2E latency, but speed-of-light distance and network subscription density over a particular link will always remain the principal contributors.

Why latency matters in a 5G Edge world

End-to-end delay and network latency in a nutshell

Written by Verizon 5G Edge Blog