Multi-Cloud Network Architecture-1
June 26, 2023

First part of a 2-part blog post, written by Mihai Dumitru, Chief Architect at Arctic Stream, about a Multicloud IT infrastructure, integrating Cisco SD-WAN, AWS and Azure.

Blog Post Hero Picture

High Level Design

Requirements

The IT infrastructure outlined in this blog post has been built from scratch with a few goals in mind:

  • Take advantage of the newest and best available technologies, since there were no constraints arising from legacy stuff
  • Rely on tried and proved security principles and technologies
  • Make it as flexible as possible (add and delete new stuff as needed, scale up and down without significant design changes)
  • Have it highly available, but without wasting resources

The business goals, however, become colliding design constraints when we start to look at details. Let’s rephrase the needs:

  • Integration of multiple clouds (AWS because of features, Cisco SD-WAN because of secure and flexible branch networks, Azure because of endpoint management, security and productivity tools)
  • Logical segmentation, i.e., the separation of the application environments in the AWS cloud, because of security concerns
  • Inspection of traffic between the AWS application environments themselves (named E-W traffic), and between the application environments and the Internet (named N-S traffic), also because of security concerns
  • Preservation of more capable security appliances (vs. the cloud native appliances)
  • Scalability without significant design changes
  • High availability and load balancing

The most difficult problem at hand is the security and high availability of the application environments by using third party security appliances for segmentation and traffic inspection. Centralized Inspection with AWS Gateway Load Balancer (GWLB) and Transit Gateway (TG) is aimed at resolving this problem.

The other problem is a scalable and resilient WAN network that has access to both AWS and Azure clous, and also offers and alternative path between the AWS and Azure resources, for the sake of routing simplicity. The Cisco SD-WAN solution integrates well with both the AWS TG and the Azure Virtual WAN.

Centralized Inspection with AWS Gateway Load Balancer and Transit Gateway

The Basic Concepts

Just one year back, there was no easy way to deploy, scale, and provide high-availability for third-party virtual network appliances in AWS. Amazon had released VPC ingress routing, making it easier to route traffic to specific EC2 instances within a VPC, but it still missed the ability to target a load balancer and Interface VPC endpoints in a VPC route table. By combining a transparent network gateway and a load balancer, the new AWS Gateway Load Balancer (GWLB) meets this requirement.

Virtual network appliances sit in line with network traffic and inspect incoming and outbound traffic flows. This type of transparent insertion has many names, such as service chaining, bump in the wire, or network functions virtualization (NFV).

A Gateway Load Balancer (GWLB) is a managed service (or control plane) that makes it easy for customers to deploy and manage a fleet of horizontally scalable inline network virtual appliances in a transparent manner for purposes such as security inspection, compliance, policy controls, and other networking services.

A Gateway Load Balancer Endpoint (GWLBE) is a data plane component of the GWLB and provides a way for customers to flexibly place interface VPC endpoints in both centralized and distributed deployments. A GWLBE is similar to AWS PrivateLink, which allows you to place your service across many accounts and VPC’s without losing centralized control and administration.

https://wp.arcticstream.ro/wp-content/uploads/2024/10/Picture1-1024x513.jpg

An Elastic IP address is a static, public IPv4 address designed for dynamic cloud computing, which is reachable from the Internet. An Elastic IP address is allocated to an AWS account, and can be registered with the public DNS.

https://wp.arcticstream.ro/wp-content/uploads/2023/06/transit_gateway_0642552978.jpeg

An AWS Transit Gateway provides a hub and spoke design for connecting VPCs and on-premises networks as a fully managed service. No VPN overlay is required, and AWS manages high availability and scalability. Transit Gateway controls how traffic is routed among all the connected spoke networks using route tables. This hub-and-spoke model simplifies management and reduces operational costs because VPCs only connect to the Transit Gateway instance to gain access to the connected networks. Transit Gateway is a Regional resource and can connect thousands of VPCs within the same AWS Region.

Some network appliances, such as stateful firewalls, require visibility to a network flow’s bidirectional transaction in order to assess valid traffic from potential threats or attacks. For these scenarios asymmetric routing can hinder the effectiveness of stateful solutions. By default, the AWS Transit Gateway attempts to isolate traffic within the availability zone the traffic originated. While this behavior is optimal for most architectures, it is not ideal for centralized stateful firewall implementations that span multiple availability zones. It can introduce asymmetric routing which can result in traffic disruption. To solve this problem the AWS Transit Gateway now supports appliance mode. Appliance mode ensures bidirectional traffic forwarding to the same VPC attachments, also known as symmetric routing. This removes the need for complex workarounds, such as source-NAT, to force traffic to return to the correct appliance.

Patterns for Inline Inspection – N/S and E/W Traffic

Customers that are implementing inline appliances typically fall into one of following architectural patterns:

  • Single VPC with north/south connectivity
  • Many VPC’s with centralized north/south connectivity
  • Many VPC’s with centralized east/west connectivity
  • Single VPC with east/west connectivity between subnets in the same VPC

The terms north/south and east/west have been used for years in traditional networking and data center environments to help describe the flow of traffic. North/south generally refers to traffic leaving the network or the data center, and is most commonly describing traffic that is coming from or going to the Internet. East/west refers to traffic flowing between resources within the data center or network.

With the model of single VPC with north/south connectivity, one has the flexibility to centralize or distribute the control plane and maintain a distributed data plane. No Transit Gateway is required.

Single VPC, Distributed Control and Data Plane (GWLB and GWLBE)

https://wp.arcticstream.ro/wp-content/uploads/2023/06/Picture1-1.jpg

Single VPC, Centralized Control Plane (GWLB), Distributed Data Plane (GWLBE)

https://wp.arcticstream.ro/wp-content/uploads/2023/06/Picture2.jpg

When many VPC’s require centralized north/south connectivity, the inline bump-in-the-wire functionality is moved in a centralized VPC. This is typically accomplished with the AWS Transit Gateway to help control separation of duties between accounts that are performing the inline functionality, and the spoke VPCs that are hosting applications. In this pattern both control and data plane functionality are centralized.

Many VPC’s, Centralized Control and Data Plane, North/South traffic (GWLB and GWLBE plus TGW)

https://wp.arcticstream.ro/wp-content/uploads/2023/06/Picture3-1024x415.jpg

The centralized VPC can also be used to addresses connectivity between two or more VPCs and is often times combined with the centralized north/south connectivity model over the AWS Transit Gateway.

Many VPC’s, Centralized Control and Data Plane, East/West traffic (GWLB and GWLBE plus TGW)

https://wp.arcticstream.ro/wp-content/uploads/2023/06/Picture4-1024x456.jpg

The single VPC that requires east/west functionality between subnets in the same VPC is not a supported configuration, because the VPC does not allow more specific routes than a VPC’s CIDR allocation to be defined within a VPC route table. Because of this, a VPC route table does not allow to redirect traffic through an appliance if the source and destination are in the same VPC. There is one exception to this rule and it applies to edge routing tables. An edge routing table allows traffic to enter a VPC from an Internet Gateway through a Gateway Load Balancer.

For this same reason it is important to consider the placement of inline functions for internet bound workloads as well. If an AWS NAT Gateway or self-managed NAT instances are being used, the GWLBE must reside between the Internet Gateway and the publicly-facing resource directly. For traffic coming in from the internet, the inline functionality or GWLBE cannot be placed after the NAT and in-between the private instances as it would require a more specific route within the VPC route table.

https://wp.arcticstream.ro/wp-content/uploads/2024/10/Picture5-1024x510.jpg

Patterns for Inline Inspection – Taking into Account the Firewall State

Asymmetric routing is a term that describes when a client’s request to a server traverses a different network path than the server’s reply. If the asymmetric return path sends the packet through a different firewall, valid traffic could be discarded due to connection tracking–a core component of stateful firewalls. A stateful firewall tracks a connection or network flow for the entire length of a transaction (such as with TCP connections from the initial SYN packet to the final FIN notification). For a firewall to track a connection effectively the network must ensure that packets are sent to the same firewall instance in both directions (client-to-server, and server-to-client). If for any reason an active flow gets routed to a firewall that is not tracking the flow’s state it can lead to unwanted packet drops.

By default, the AWS Transit Gateway applies a very specific routing algorithm that is optimized to maintain availability zone affinity. While this is ideal for performance and availability, it can present problems for the centralized firewall architectures. This is especially true in the centralized east/west architecture pattern where the source and the destination might be in two different availability zones.

With the AWS Transit Gateway appliance mode, it’s possible to specify attachments that should forward network flows out of the same availability zone regardless of the flow’s direction and what availability zone it originated. The AWS Transit Gateway Appliance Mode ensures that network flows will be symmetrically routed to the same availability zone and network appliance. Exactly one AWS Transit Gateway must be connected to the Appliance VPC in order to guarantee stickiness of flows, because AWS Transit Gateways do not share flow state information with each other.

https://wp.arcticstream.ro/wp-content/uploads/2024/10/Picture7-1024x555.jpg

Sources: