English

AI ECN

Posted on Mar 29, 2024 by
73

What Is AI ECN?

The AI ECN (Artificial Intelligence Explicit Congestion Notification) function dynamically adapts the ECN thresholds of lossless queues by analyzing the traffic patterns in real-time. By doing so, it optimizes the performance of lossless services, ensuring minimal delay, high throughput, and zero packet loss. This intelligent adjustment of ECN thresholds enables the delivery of optimal performance for lossless services in the network.

Differences Between AI ECN and ECN

The congestion control mechanism commonly used in RDMA over Converged Ethernet version 2 (RoCEv2) networks aims to alleviate congestion by following these steps: When congestion is detected by a network device, it sends an ECN-marked packet to the receiving server. Subsequently, the receiving server sends a Congestion Notification Packet (CNP) to the sending server, instructing it to reduce its packet sending rate.

Both AI ECN and ECN employ this congestion control mechanism. However, traditional ECN requires manual configuration of ECN thresholds, and congestion can only be detected when buffer usage surpasses the pre-configured ECN threshold. This approach falls short when it comes to lossless services that demand uninterrupted transmission, as the manually set ECN thresholds cannot adapt to changing buffer space in the queue or meet the requirements of diverse traffic models on the network.

AI ECN addresses these limitations by utilizing intelligent algorithms. Specifically designed for lossless queues, AI ECN enables a device to undergo AI training based on the live network's traffic model. It then dynamically adjusts the ECN thresholds according to traffic characteristics, such as the queue length. This precise management and control of the lossless queue buffer ensure optimal performance throughout the network.

The Significance of AI ECN

To address buffer congestion and implement traffic control for lossless queues, you can configure two types of buffer thresholds: ECN (Explicit Congestion Notification) and PFC (Priority Flow Control). In this approach, when the buffer usage of the outbound queue on a device reaches the ECN threshold, the device notifies the sending server to reduce the packet sending rate. Conversely, when the buffer usage of the inbound queue on the device reaches the PFC threshold, the device signals the upstream device to halt traffic transmission. Considering that congestion rarely occurs in the inbound direction unless there is congestion in the outbound direction, it is advisable to prioritize triggering the ECN threshold when congestion is detected. This way, the sending server is instructed to reduce the packet sending rate. On the other hand, it is not recommended to trigger PFC, as it may lead to interruptions in traffic flow.

Setting an appropriate ECN threshold is crucial to achieve low delay, high throughput, and zero packet loss. However, the network's traffic characteristics, including size, rate, and buffer usage, are constantly changing. Different types of traffic also have varying requirements for the ECN threshold. Several factors should be considered when determining the ECN threshold:

Interval between CNP transmission and packet rate reduction

There is a gap between when a device sends a Congestion Notification Packet (CNP) and when the sending server actually reduces its packet sending rate. During this period, the server continues to transmit traffic at the original rate, potentially worsening congestion in the queue buffer. This can trigger Priority Flow Control (PFC) and result in service interruptions. To minimize such impacts, the ECN threshold should be set to ensure a sufficient buffer gap between the ECN and PFC thresholds. This allows the buffer to accommodate traffic sent by the server before it becomes aware of the congestion.

Balancing delay-sensitive and throughput-sensitive traffic

  • High ECN thresholds: Delayed ECN marking can be beneficial for meeting the bandwidth requirements of throughput-sensitive elephant flows. It allows for a higher traffic sending rate and buffer space to handle burst traffic in a queue. However, it results in longer queue delays, which negatively impact delay-sensitive mice flows.

  • Low ECN thresholds: Triggering ECN marking promptly instructs the server to reduce the packet sending rate, ensuring a low buffer depth, reduced packet queuing, and lower queue delays. This benefits delay-sensitive mice flows. However, a low ECN threshold restricts bandwidth for throughput-sensitive elephant flows and may not guarantee high throughput for such flows.

Finding the right balance between these considerations is important to set an optimal ECN threshold that meets the requirements of various traffic types, minimizes service interruptions, and ensures both low delay and high throughput without packet loss.

Operational Mechanism of AI ECN

AI ECN utilizes Embedded AI (EAI) to enable intelligent computation. EAI serves as a comprehensive framework system integrated into the device, supporting AI functions. It facilitates model management, data acquisition, and preprocessing for AI ECN, and facilitates the transmission of inference results to AI ECN. The figure illustrates the process: the device collects traffic characteristics from the live network and forwards them to the AI ECN component. The AI ECN component then intelligently adjusts the optimal ECN thresholds for lossless queues based on the inference results obtained from the EAI system. This ensures low latency and high throughput for lossless services across various traffic scenarios.

Operational Mechanism of AI ECN

AI ECN utilizes Embedded AI (EAI) to enable intelligent computation. EAI serves as a comprehensive framework system integrated into the device, supporting AI functions. It facilitates model management, data acquisition, and preprocessing for AI ECN, and facilitates the transmission of inference results to AI ECN. The figure illustrates the process: the device collects traffic characteristics from the live network and forwards them to the AI ECN component. The AI ECN component then intelligently adjusts the optimal ECN thresholds for lossless queues based on the inference results obtained from the EAI system. This ensures low latency and high throughput for lossless services across various traffic scenarios.

1. The forwarding component on the network device gathers traffic characteristics, including queue buffer usage, bandwidth throughput, and current ECN thresholds. It employs telemetry to provide real-time updates on the network traffic status to the AI ECN component.

2. Once the AI ECN function is enabled, the AI ECN component automatically subscribes to the services of the EAI system. It receives the network traffic status information pushed by the device, allowing it to discern the current traffic model and determine whether it corresponds to a known scenario within the EAI system.

  • If the traffic model matches a trained model in the EAI system, the AI ECN component recognizes it as a known scenario. It then utilizes the optimal inference results derived from the EAI system to calculate ECN thresholds that align with the current network status. This mode is referred to as the model inference mode or NN mode, as it involves Neural Network algorithms.

  • If the traffic model does not correspond to a trained model in the EAI system, the AI ECN component identifies it as an unknown scenario. In this case, the AI ECN component continuously adjusts the current ECN thresholds in real-time using a heuristic search algorithm, based on the live network status. This dynamic adjustment ensures low delay and high bandwidth. Consequently, optimal ECN thresholds are obtained. This mode is known as the heuristic inference mode or BBR mode, as it employs the Bottleneck Bandwidth and Round-trip propagation time (BBR) algorithm.

3. The AI ECN component delivers the optimal ECN threshold configuration to the device, which adjusts the ECN thresholds of the lossless queues accordingly.

4. The device repeats these operations for new traffic statuses, ensuring the optimal performance of lossless services.

Tags

You might be interested in

See profile for undefined.
FS Official
AI Firewall
See profile for undefined.
FS Official
VPN
See profile for undefined.
FS Official
VPC