English 



M-LAG

Updated on Apr 2, 2024 by

 111

What Is M-LAG?

Multi-Chassis Link Aggregation Group (M-LAG) technology enhances network reliability by enabling link aggregation between two access switches within the same state. This allows for negotiation between the switches and user-side devices or servers, improving reliability not just at the card level but also at the device level. Furthermore, M-LAG facilitates independent upgrades of devices, ensuring service traffic stability. Consequently, M-LAG finds extensive application in data center networks due to its ability to bolster link reliability and streamline device maintenance.

Why Do We Need M-LAG?

In recent years, M-LAG has emerged as a prominent virtualization technology within networking infrastructure. However, its development has been an evolutionary process rather than an instantaneous creation.

In traditional data center networks, redundancy is ensured through the deployment of redundant devices and links, aiming to maintain high levels of reliability. To address challenges such as underutilized links and high maintenance costs, stacking technology has been utilized. This technology allows multiple data center switches to be amalgamated into a single virtual switch, streamlining network deployment and reducing operational overhead.

With the continual growth of service traffic demands and the escalating need for enhanced network reliability, M-LAG virtualization technology has been developed. This advancement aggregates links across multiple devices, enhancing link reliability not only at the card level but also at the device level. This evolution provides a more robust networking solution, catering to the evolving demands of modern data centers.

STP and VRRP Technologies

STP and VRRP technologies have traditionally been utilized in data center networks to provide link redundancy, ensuring basic reliability standards are met.

STP+VRRP networking

STP+VRRP networking

However, the combined use of STP and VRRP presents certain drawbacks, making it inadequate for handling the rapid expansion of traffic and the scalability demands of modern data center networks.

- STP's link blocking mechanism leads to suboptimal utilization of Layer 2 links.

- VRRP's master/backup functionality results in underutilization of Layer 3 links.

- Servers can only connect to access devices in active/standby mode, limiting flexibility and efficiency.

To address the limitations of the STP+VRRP approach, stacking and M-LAG virtualization technologies have been developed. These advancements aim to accommodate the escalating service traffic and enhance network reliability while maintaining redundancy.

Stacking and M-LAG Virtualization Technologies

M-LAG and stacking technologies both enhance Layer 2 link utilization and improve network reliability in data center environments. While M-LAG prioritizes service stability, both technologies offer benefits for redundant access and link backup.

Stacking and M-LAG Virtualization Technologies

At the access layer of data center networks, stacking and M-LAG serve as essential horizontal virtualization technologies, ensuring resilient terminal access and scalable network operations. M-LAG stands out for its superior reliability and the ability to upgrade individual member devices independently, providing added flexibility compared to stacking.

In terms of comparing the two technologies, stacking and M-LAG each have their advantages and disadvantages. However, M-LAG shines in scenarios where minimizing service interruption during upgrades and maintaining high network reliability are paramount concerns. Therefore, for data center networks requiring minimal downtime and maximum reliability, M-LAG technology is recommended as the preferred terminal access solution.

Comparison between stacking and M-LAG

Comparison between stacking and M-LAG

How to Establish an M-LAG System?

Within the depicted M-LAG configuration, ServerA, DeviceA, and DeviceB are interconnected through inter-device link aggregation. DeviceA and DeviceB establish M-LAG pairing via a Dynamic Fabric Service Group (DFS group). Subsequently, they negotiate their roles as master or backup within the M-LAG setup. Following successful negotiation, real-time synchronization of information occurs between DeviceA and DeviceB through the peer-link. Fault detection within the M-LAG system relies on the dual-active detection (DAD) link, enabling periodic exchange of heartbeat packets between member devices for timely detection and resolution of any dual-active conditions.

M-LAG Networking Typology

M-LAG Networking Typology

To establish an M-LAG system, you follow a structured process involving five key steps:

1. DFS Group Pairing: Begin by configuring a Dynamic Fabric Service (DFS) group between the devices involved in the M-LAG setup. In this step, DeviceA and DeviceB are paired together through the DFS group.
2. DFS Group Master/Backup Negotiation: Once the DFS group is established, DeviceA and DeviceB negotiate their roles within the group, determining which device will act as the master and which will serve as the backup. This negotiation ensures redundancy and failover capabilities within the M-LAG system.
3. M-LAG Member Interface Master/Backup Negotiation: Following the determination of master and backup roles for the devices, each member interface within the M-LAG setup negotiates its status as either a master or backup interface. This negotiation process optimizes the distribution of traffic and ensures efficient utilization of network resources.
4. Dual-Active Detection (DAD): M-LAG fault detection relies on the Dual-Active Detection (DAD) mechanism. Through DAD, member devices periodically exchange heartbeat packets via a designated link, allowing them to detect and resolve instances of dual-active conditions that could potentially disrupt network operations.
5. M-LAG Information Synchronization: Finally, DeviceA and DeviceB synchronize critical information in real-time via the peer-link established between them. This synchronization ensures consistency and coherence in the M-LAG configuration, enabling seamless failover and enhanced network reliability.

By following these five steps, you can effectively establish an M-LAG system that provides enhanced redundancy, load balancing, and fault tolerance for your network infrastructure.

How Does an M-LAG Work?

M-LAG Operational Scenarios

Known Unicast Traffic Forwarding:

In functional M-LAG setups, known unicast traffic, both from the user side to the network side and vice versa, undergoes load balancing orchestrated by the M-LAG master and backup devices using per-flow mode. This ensures efficient utilization of network resources and balanced traffic distribution.

Known unicast traffic forwarding in an M-LAG

Known unicast traffic forwarding in an M-LAG

Multicast, Broadcast, and Unknown Unicast Traffic Forwarding:

Operational M-LAG configurations facilitate the forwarding of multicast, broadcast, and unknown unicast traffic between the user side and the network side. To prevent potential loops, M-LAG employs a unidirectional isolation mechanism, which ensures that traffic received via the peer-link interface is not forwarded through M-LAG member interfaces.

Multicast, broadcast, and unknown unicast traffic forwarding in an M-LAG

Multicast, broadcast, and unknown unicast traffic forwarding in an M-LAG

M-LAG Failure Scenarios

Uplink Failure:

In case of an uplink failure on the M-LAG master device, traffic is seamlessly rerouted through the M-LAG backup device via the peer-link, ensuring continuous connectivity. For Layer 3 network connections, configuring a best-effort link between the master and backup devices is essential to maintain traffic flow. In scenarios where the DAD link fails, the M-LAG system remains unaffected, but if the peer-link fails, it can lead to a dual-active conflict, prompting necessary configurations to prevent traffic loss.

Traffic forwarding in case of an uplink failure

Traffic forwarding in case of an uplink failure

M-LAG Member Interface Failure:

When an M-LAG member interface experiences a failure, traffic from the user side to the network side is efficiently balanced across unaffected links. Despite the fault, the network-side device continues to send traffic to both M-LAG devices. The failure of a member interface transitions the dual-homing scenario to a single-homing one, altering the behavior of the interface isolation mechanism until recovery.

Traffic forwarding in case of an M-LAG member interface failure

Traffic forwarding in case of an M-LAG member interface failure

Peer-Link Failure:

Detection of a peer-link failure triggers immediate action by the M-LAG member device, initiating the dual-active detection process via the designated link. In such events, the backup device restricts traffic forwarding to prevent broadcast storms or MAC address flapping, ensuring network stability. Upon peer-link recovery, interfaces are restored gradually to their operational states, allowing the interface isolation mechanism to resume functionality.

Traffic forwarding in case of a peer-link failure

Traffic forwarding in case of a peer-link failure

M-LAG Member Device Failure:

In the event of a master device failure, the backup device seamlessly assumes the master role and continues traffic forwarding. Conversely, if the backup device fails, the master and backup statuses remain unchanged, with traffic still flowing through the master device. Upon recovery of a faulty M-LAG member device, renegotiation occurs between devices, restoring load balancing and preserving the original master and backup roles.

In an operational M-LAG setup, known unicast traffic flows are efficiently managed. Traffic from the user side to the network side and vice versa undergoes load balancing by both the master and backup devices. This load balancing is conducted on a per-flow basis, ensuring optimal distribution of traffic across the M-LAG infrastructure.

For multicast, broadcast, and unknown unicast traffic, a similar approach is taken. Traffic originating from the user side and destined for the network side is broadcasted across M-LAG devices to ensure comprehensive coverage. To prevent potential loops, an unidirectional isolation mechanism is employed. This mechanism ensures that traffic received via the peer-link interface is not forwarded through an M-LAG member interface, maintaining network integrity and stability.

Traffic forwarding in case of an M-LAG member device failure

Traffic forwarding in case of an M-LAG member device failure

Application Scenarios of M-LAG

M-LAG finds its application in various network scenarios where devices or switches are connected redundantly to Layer 2, VXLAN, or Layer 3 networks, including multi-level M-LAG setups.

Dual-Homing to Layer 2 Network:

In this scenario, a device connects to a Layer 2 network via M-LAG. To prevent loops, the M-LAG devices must be virtualized into the same Spanning Tree Protocol (STP) logical node. This can be achieved by configuring the M-LAG devices as STP root bridges or using Virtual Spanning Tree Protocol (V-STP) to synchronize their STP status.

Dual-Homing to VXLAN Network:

When a device is dual-homed to a VXLAN network through M-LAG, the M-LAG devices need to act as Virtual Tunnel Endpoints (VTEPs). This setup allows DeviceA and DeviceB to establish a VXLAN tunnel with an external device using the VTEP's IP address, regardless of manual or automatic tunnel establishment methods.

Dual-Homing to Layer 3 Network:

In this scenario, M-LAG member devices serve as gateways between Layer 2 and Layer 3 networks. Both devices function as gateways and must share the same gateway IP address and MAC address for communication with network-side devices. Configuring identical IP addresses and virtual MAC addresses ensures seamless gateway functionality.

Multi-Level M-LAG:

In large networks, M-LAG can be deployed on both spine and leaf nodes to enhance link reliability. Each pair of devices establishes an M-LAG, forming a multi-level M-LAG setup. To prevent STP loops in this scenario, Virtual Spanning Tree Protocol (V-STP) is utilized to synchronize the STP status of member devices within each M-LAG.

Overall, M-LAG offers flexibility and redundancy in various network architectures, ensuring efficient and reliable connectivity across different layers and network types.