Intelligent Lossless Network

Updated on Apr 12, 2024 by

What Is an Intelligent Lossless Network?

RoCEv2, the RDMA over Converged Ethernet version 2, is utilized in scenarios such as distributed storage, high-performance computing (HPC), and AI to reduce CPU workload, minimize latency, and enhance application performance. In these environments, an intelligent lossless network employs an AI-powered iLossless algorithm, ensuring maximum throughput, minimal latency, and zero packet loss. This optimizes computing and storage efficiency and establishes a future-ready converged network for data centers.

What Are the Advantages of an Intelligent Lossless Network?

AlphaGo's triumph in March 2016 marked a significant milestone in AI research, heralding the arrival of the AI-driven Fourth Industrial Revolution. As a result, an increasing number of enterprises are incorporating AI into their digital transformation strategies. In the AI era, enterprise data centers (DCs) are shifting their focus from rapid service provisioning to efficient data processing. DCs consist of three vital elements: computing, storage, and network, all of which mutually reinforce each other.

With the exponential growth of AI application data, heterogeneous computing utilizing general processing units (GPUs) and AI chips is flourishing, resulting in a 600-fold improvement in computing performance over the past five years. In the realm of data storage, solid-state drives (SSDs) exhibit access performance that is 100 times higher than traditional hard disk drives (HDDs), while Non-Volatile Memory Express (NVMe) achieves a performance that is 100 times superior to SSDs. The rapid advancements in computing and storage present new demands for data center networks: zero packet loss, high throughput, and low latency. In response to these requirements, intelligent lossless networks have emerged.

The intelligent lossless network solution encompasses the following objectives:

Fully converged Ethernet network

By leveraging RoCEv2, the intelligent lossless network aims to achieve Ethernet-based unified management of the service network, computing network, and storage network. This resolves the challenges of traditional data center networks in the IP and FC eras, where multiple networks utilizing different technologies coexist. Moreover, it meets the low latency requirements of data center storage networks.


Application acceleration

The intelligent lossless network provides intelligent lossless algorithms like Convolutional Neural Network (CNN) and Deep Q-Network (DQN) to address network congestion and offload applications to the network, thereby accelerating computing and storage applications.

Autonomous driving network

By utilizing intelligent lossless algorithms such as CNN and DQN, the intelligent lossless network autonomously learns network parameters and adjusts network settings accordingly. This ensures zero packet loss, maximum throughput, and minimal latency.

Key Technologies of the Intelligent Lossless Network Solution

Flow regulation

Flow regulation is a fundamental technology used to guarantee zero packet loss within a network. It adjusts the data transmission rate of the traffic sender, ensuring that the traffic receiver can receive all packets, thus preventing packet loss during congested traffic reception.

Prominent flow regulation technologies include:

  • 1. Priority-based Flow Regulation (PFR): This is the most widely adopted flow regulation technology. When a PFR-enabled queue on a device becomes congested, the upstream device halts traffic transmission to the queue, achieving zero packet loss.

  • 2. PFR storm control: Also known as PFR deadlock detection, this technology resolves network traffic interruptions caused by PFR storms.

  • 3. PFR deadlock prevention: By identifying cyclic buffer dependencies and eliminating the necessary conditions for their occurrence, this technology resolves PFR deadlock issues and enhances network reliability.

Congestion management

Congestion management is a method of controlling the overall amount of data entering a network to maintain an acceptable traffic level. Unlike flow regulation, which applies to traffic receivers, congestion management pertains to networks. It involves collaboration among forwarding devices, traffic senders, and traffic receivers, utilizing congestion feedback mechanisms to adjust traffic across the entire network and alleviate congestion.

Prominent congestion management technologies include:

  • 1. Explicit Congestion Notification (ECN): When congestion arises within a network, ECN enables the traffic receiver to detect congestion and notify the traffic sender accordingly. Upon receiving the notification, the traffic sender reduces the packet sending rate, preventing congestion-related packet loss and maximizing network performance.

  • 2. AI ECN: Leveraging the Intelligent Lossless (iLossless) algorithm, AI ECN enables devices to perform AI training based on real-time network traffic models. It predicts network traffic changes and optimal ECN thresholds, adjusting these thresholds in real-time based on live network traffic. This accurate management and control of lossless queue buffers ensure optimal network performance. Furthermore, AI ECN can be combined with queue scheduling to achieve hybrid scheduling of TCP and RoCEv2 traffic, ensuring lossless transmission, low latency, and high throughput for optimal performance of lossless services.

  • 3. Network-based Proactive Congestion Control (NPCC): NPCC is a proactive congestion control technology centered around network devices. It intelligently identifies congestion status on network device ports, allowing devices to proactively send Congestion Notification Packets (CNPs) and precisely control the rate of RoCEv2 packets sent by the server. This maintains a consistently appropriate transmission rate, ensuring low latency and high throughput for RoCEv2 services in long-distance scenarios like Data Center Interconnect (DCI).

  • 4. Intelligent Quantized Congestion Notification (iQCN): iQCN enables forwarding devices to intelligently detect network congestion. iQCN-enabled forwarding devices proactively send CNPs to the sender based on the interval between receiver-sent CNPs and the interval between rate increase events of the sender's NIC. This allows the sender to timely receive CNPs and avoid increasing its packet sending rate, thereby preventing congestion from worsening.

Traffic scheduling

Traffic scheduling ensures load balancing for service traffic and network links, guaranteeing quality for various types of service traffic.

  • 1. Dynamic load balancing: During packet forwarding, the system dynamically selects an appropriate link based on traffic bandwidth and the load of each member link. This ensures even traffic distribution, preventing long delays or high packet loss due to heavy load on a particular link.

  • 2. Queue scheduling: This controls traffic sending policies between different queues, providing differentiated quality assurance for traffic in each queue.

Integrated Network and Computing (INC)

In the traffic model of High-Performance Computing (HPC) networks, over 80% of the traffic consists of packets with payloads less than 16 bytes. This poses demanding requirements on static latency. Typically, Ethernet chips have a static latency of around 500 ns, while IB chips have a static latency of about 90 ns. Collaborative network-computing approaches can reduce or eliminate the disadvantage of Ethernet's static latency.

INC combines MPI communication data on network devices, ensuring low latency in scenarios involving small-sized packets in HPC environments. This reduces communication waiting time and improves computing efficiency.

Intelligent Lossless NVMe Over Fabrics (iNOF)

iNOF technology enables rapid management and control of hosts, applying intelligent lossless network technology to storage systems. This convergence of computing and storage networks enhances their efficiency.

FS Same Day Shipping Ensures Your Business Success
Nov 20, 2023
FS Same Day Shipping Ensures Your Business Success
Related Topics