It’s Time for Enterprise Networking to Embrace Cloud Architectures

I’ll start at the end. Cloud computing is now the vernacular for computing. Cloud networking will, within the next 24 months, be the vernacular for networking. The same paradigms that have revolutionized computing will do so for networking.

Monolithic architecture moved into client/server architectures, which then evolved into service-oriented architectures, which has in turn given way to the now ubiquitous microservices/container model. This microservices architecture is the mainstay of cloud and public cloud computing, as well as serverless/utility computing models. Cloud software architectures bring numerous benefits to applications including:

  • Horizontal scale
  • Use of resource pools for near unlimited capacity
  • Distributed services and databases
  • Fault tolerance and containerization for hitless “restartability”
  • In-service upgrades
  • Programmability, both northbound and southbound, for flexible integration across services
  • Programming language independence

It is these attributes that we see (for the most part) in the large, global SaaS applications such as Amazon’s e-commerce website, Netflix’s streaming service, Facebook, and Twitter’s social networks. The same capabilities – with the same global, highly available, and horizontal scale – can be applied to enterprise networking.

The heart of networking is routing. Routing algorithms have maintained the same architecture for the past 30 years. Border Gateway Protocol (BGP4), the routing protocol of the Internet, has been in use since 1994. Routing protocols are designed for resiliency and autonomous operation. Each router or autonomous system can be an island unto itself, needing only visibility and connectivity to its directly attached neighbors. This architecture has allowed for the completely decentralized and highly resilient operation of BGP routing, yet it has also introduced challenges. Scaling and convergence problems continually plague BGP operations and Internet performance. There have been proposals to replace BGP, but its installed base makes that nearly impossible. The next best option is to augment it.

The most common mechanism for augmentation is to build an overlay network. An overlay network uses the BGP4-powered Internet as a foundation and bypasses BGP routing using alternative routing protocols. This approach combines the best of BGP routing – resiliency and global availability – with the performance and scale improvements of new and innovative routing protocols. The overlay model and these new routing protocols open the door to routing based on performance metrics and application awareness, and the potential to bring LAN-like performance to the Internet-powered WAN. This is at the heart of the cloud networking evolution and software-defined networking moving forward.

Building atop BGP4’s flat, decentralized architecture, new routing protocols are leveraging cloud software architectures to develop fast, scalable, and performance-driven routing protocols, embracing both the centralized and the distributed nature of cloud computing. The Internet, acting as the underlying network, provides basic connectivity. A broad network of sensors, small microservices deployed across major points of presence globally, run simple performance tests at set intervals and feed the results to a centralized, hierarchical routing engine. The basic tests provide insights into throughput, loss, and latency at key points of presence globally. A centralized routing engine then leverages deep learning to use the performance data, both current and historical, to create routes. The routing updates can be pushed to overlay network routers, and these routers then update their forwarding tables. Route hierarchy brings scale and resiliency. For example, should connectivity to the centralized routing engine be lost, routing persists and survives via router-to-router updates and, in the case of a potential prolonged outage, by bypassing to the underlying network.

Key elements deliver benefits

There are a few key elements of centralized overlay routing that are really novel:

Performance as a metric: BGP does not factor performance in route calculations, so it is possible (if not probable) that a poor performing link or multiple links will be used, impacting application performance. This manifests itself in poor TCP performance (which leads to degraded throughput), as well as high loss, impacting real-time applications. The use of performance data in centralized overlay routing introduces the capability to route not just based on hop count or least cost, but also by best performance.

Application specific routing: Using performance telemetry for routing enables routes with an application bias. High throughput routes can be used for file transfers, and low loss and latency routes can be used for real-time applications such as voice or video.

High availability:  The use of proven, battle-tested, cloud software architecture for cloud networking ensures that centralized routing is not only resilient but is also highly available on a number of levels. Use of distributed microservices and the capability to “restart” individual services on the fly without service outage – a key element of cloud software architecture – combined with the safety net of reverting to underlay BGP4 routing, ensures packets continue to flow even in the event of something catastrophic.

Native integration into SD-WAN and SDN: As SD-WAN continues to overtake the WAN edge, support for centralized routing will continue to grow. Progressive SD-WAN vendors are today starting to utilize overlay networks and centralized routing, demonstrating its viability.

Networking is evolving, embracing cloud software architectures and techniques. It is pushing into the enterprise from two sides – from the data center, and from the WAN edge. This push is accelerated by the approach of augmenting Internet technologies versus an outright replacement, enabling enterprises to deploy these new technologies across their networks quickly. The effects are immediate and noticeable, as the performance of critical business applications is positively impacted with the enterprise, across the enterprise WAN, and with enterprise SaaS applications and cloud workloads.