For many years, data centers have been built in a three-tier architecture. But with the data center consolidation, virtualization, hyper-converged systems springing up, leaf-spine architecture gradually becomes the mainstream. Then how much do you know about leaf-spine architecture? How to build leaf-spine architecture? We will explain what leaf-spine architecture is and how to design leaf-spine architecture.
Traditional three-tier architecture consists of three layers in the deployment: core, aggregation/distribution and access layer. The switching devices in each layer are interconnected by pathways for redundancy which can create loops in the network.
In the past, the majority of data center traffic is from server to server, or from server to storage systems, which we consider as “east to west” traffic. The three-tier architecture model is typically designed for the east to west traffic, so the packet moves through three hops—it flows to the core, is routed to the aggregation layer switch, and then it is forwarded to the access switch where the end devices are connected. With the transformation of data center, it requires more data travel within the data center while the number of hops is increasing, which adds more possibility to packet loss and significant latency. So if running massive east-west traffic through this conventional architecture, devices connected to the same switch port may contend for bandwidth, resulting in poor response time obtained by end-users. Thus, this three-tier architecture is not suitable for the modern virtualized data center where compute and storage servers may be located anywhere within the facility.
With three tier gradually losing momentum in the modern data center, spine-leaf architecture comes to its place. As shown below, the leaf-spine design only consists of two layers: the leaf layer and the spine layer, which reduces the hops and guarantees reduced delay. This is the so-called “leaf-spine” architecture, where there are only two tiers of switches between the servers and the core network.
The spine layer is made up of switches that perform routing, working as the backbone of the network. The leaf layer involves an access switch that connects to endpoints like servers, storage devices. In a leaf-spine architecture, every leaf switch is interconnected with each spine switch. With this design, any server can communicate with any other server with no more than one interconnection switch path between any two leaf switches.
Leaf-spine architecture has become a popular data center architecture designed especially when data centers grew in scale with more switching tiers. The advantages of the leaf-spine model are the improved latency, reduced bottlenecks, expanded bandwidth and scalability.
Firstly, leaf-spine uses all interconnection links. In hyper-scale data centers, there might be hundreds or thousands of servers that are connected to a network. In this case, the leaf switch can be deployed as a bridge between the server and the core network. Each leaf connects to all spines with no interconnections among neither spines themselves nor leafs which creates a large non-blocking fabric. While in a three-tier network, one server may need to traverse a hierarchical path through two aggregation switches and one core switch to communicate with another switch, which adds latency and creates traffic bottlenecks.
Another advantage is the ease of adding additional hardware and capacity. Leaf-spine architectures can be either layer 2 or layer 3, thus leaf switch can be added to increase capacity and spine switch can be added as needed for uplinks, expanding the interlayer bandwidth and reducing the oversubscription.
Before designing a leaf-spine architecture, you need to figure out some important related factors. In this aspect, oversubscription ratios, leaf and spine scale, uplinks from leaf to spine, built at layer 2 or layer 3 should be considered.
Oversubscription Ratios — Oversubscription is the ratio of contention when all devices send traffic at the same time. It can be measured in a north/south direction (traffic entering/leaving a data center) as well as east/west (traffic between devices in the data center). Current modern network designs have oversubscription ratios of 3:1 or less, which is measured as the ratio between the upstream bandwidth (to spine switches) and downstream capacity (to servers/storage).
The figure below illustrates how to measure the oversubscription ratio of leaf and spine layers. The leaf switch has 48× 10G ports, giving a total 480Gb/s of port capacity. If connecting the 4× 40G uplink ports of each leaf switch to the 40G spine switches so the leaf will have a total 160Gb/s uplink capacity. That’s how the ratio comes— 480: 160, and we get the 3:1 ratio.
Leaf and Spine Scale — As the endpoints in the network connection only to the leaf switches, the number of leaf switches in the network depends on the interface number required to connect all the endpoints including multihomed endpoints. Because each leaf switch connects to all spines, the port density on the spine switch determines the maximum number of leaf switches in the topology. And the number of spine switches in the network is governed by a combination of the throughput required between the leaf switches, the number of redundant/ECMP (equal-cost multi-path) paths between the leafs, and the port density in the spine switches.
40G/100G Uplinks from Leaf to Spine — For a leaf-spine network, the uplinks from leaf to the spine are typically 40G or 100G and can migrate over time from a starting point of 40G (Nx 40G) to become 100G (Nx 100G). An ideal scenario always has the uplinks operating at a faster speed than downlinks in order to ensure there isn’t any blocking due to micro-bursts of one host bursting at line-rate.
Layer 2 or Layer 3 — Two-tier leaf-spine networks can be built at either layer 2 (VLAN everywhere) or layer 3 (subnets). Layer 2 designs provide the most flexibility allowing VLANs to span everywhere and MAC addresses to migrating anywhere. Layer 3 designs provide the fastest convergence times and the largest scale with fan-out with ECMP supporting up to 32 or more active spine switches.
Here we take FS leaf-spine switches as an example to show how to build a leaf-spine architecture. Given that we want to build a data center fabric with a primary goal of at least 960 10G servers. In this case, we will use FS N5860-48SC as the leaf switch and N8560-64C as the spine switch.
FS N8560-64C is a 64-port layer 3 data center switch supporting 64 100Gb QSFP28 designed for cloud data center.
N5860-48SC fixed switch with 48× 10G ports serves as the leaf switch. It is capable of 8× 100G QSFP28 uplinks.
The two models of data center switches are used to build a 100G spine-leaf network architecture. The connections between the spine switches and leaf switches are 100G, while connections between the leaf switches and servers are 10G. To cover 960 servers, we will use 20 N5860-48SC switches and 2 N8560-64C.
Thus the 100G QSFP28 uplinks of N5860-48SC can be used to connect the ports on N8560-64C, and the 10G SFP+ ports are suggested to connect servers, routers and other end devices. Every leaf switch is connected to every spine. Therefore, through the above formulas, we can have 2 spine switches and 20 leaf switches here. So in building this leaf-spine architecture, the maximum amount of 10G servers is 960 at 2.4:1 oversubscription. As we add more spine capacity to the fabric, the oversubscription ratio will reduce. For example, when we add to four N5860-48SC, the oversubscription ratio will be 1.2:1, which is very close to 1:1. This ensures leaf switches forward uplink and downlink traffic with no packet loss.
Depending on the network speeds, there are different switch models recommended to use as lead switches. FS N series data center switches in the chart below come with complete system software and applications to facilitate the rapid service deployment and management for both traditional and fully virtualized data center.
|Ports||48x 10G SFP+ and 8x 100G QSFP28 Uplinks||48x 25G SFP28 and 8x 100G QSFP28 Uplinks||48x 10G SFP+ and 8x 100G QSFP28 Uplink|
|Forwarding Rate||1.90 Bpps||2.98 Bpps||1.90 Bpps|
|Switching Capacity||4 Tbps||4 Tbps||4 Tbps|
|Max Power Consumption||<300W||<300W||<300W|
For spine switches, here are some recommendations. They are built with the advanced feature sets, including MLAG, VXLAN, SFLOW, BGP and OSPF, etc, serving as the ideal choice for data center core switches.
|Ports||32x 100G QSFP28||32x QSFP+||32x 400G QSFP-DD|
|Forwarding Rate||4.76 Bpps||9.52 Bpps||19 Bpps|
|Switching Capacity||6.4 Tbps||12.8 Tbps||12.8 Tbps|
|Max Power Consumption||450W||600W||1300W|
Clearly leaf-spine architecture overcomes the imitation of traditional three-tier network architecture and brings benefit to the modern data center. Deploying leaf-spine network architecture and using high-performance data center switches are imperative for data center managers as leaf-spine network topology allows data centers to thrive while accomplishing all needs of the business.