DDC Technology: A Revolutionary Solution by AIGC Network

Posted on Jan 18, 2024 by

 594

By 2023, artificial intelligence-generated content (AIGC) technology—of which ChatGPT is a prime example—will flourish and change the industry in several fields, including text generation, code development, and poetry production. In this article, we will explore the groundbreaking DDC technology for networking, a revolutionary solution introduced by the new generation of AIGC Network. This innovative technology promises to revolutionize the networking landscape, enhancing efficiency and connectivity in unprecedented ways.

Three Methods of GPU Load Sharing

The extraordinary power of the aforementioned AIGC large models stems not only from their vast amount of data, but also from the continuous evolution and advancement of algorithms. When training these large models, multiple GPUs are often required to distribute the workload. Three methods are employed to achieve efficient load sharing: data parallelism, tensor parallelism, and pipeline parallelism.

Data Parallelism

Data parallelism involves splitting the model's input data into multiple mini-batches and parallelly processing these mini-batches across different GPUs. Each GPU possesses a complete copy of the model and independently handles its own mini-batches. Through inter-GPU communication and synchronization, the model parameters are updated and integrated. Data parallelism is suitable for models with large-scale training datasets, accelerating the training process and improving the convergence speed of the model.

Tensor Parallelism

Tensor parallelism is typically employed when the model is too large to fit in a single processor's memory. In this method, different parts of the model are allocated to different GPUs, with each GPU responsible for processing a portion of the input data and executing the corresponding computations. Through frequent inter-GPU communication and synchronization, the outputs are collected and integrated, which may lead to high communication overhead. High-speed connections between processors are necessary for tensor parallelism to minimize exchange delays.

Pipeline Parallelism

Pipeline parallelism breaks down the model's computational flow into multiple stages, parallelizing the execution of these stages across different GPUs in a pipelined manner. Each GPU handles a specific portion of the overall model computation, passing the processed results to the next GPU for further calculations. This approach reduces the overall training time and is especially beneficial for models with complex computational flows and multiple consecutive stages, but it requires careful management of the pipeline to avoid creating too large stalls, where some processors may pause their work while waiting for dependent computation results. Sometimes, these three parallel tactics are coupled to improve the training process.

Three Traditional Solutions to Support the Operation of AIGC

In traditional solutions, there are three common solutions to support the operation of AIGC: InfiniBand, RDMA, and frame switches.

InfiniBand Networking

InfiniBand networking is a high-speed interconnect technology used in high-performance computing and data centers. Its advantages include high bandwidth, low latency, and no congestion. However, it can be expensive and costs several times more than traditional Ethernet networking.

InfiniBand Networking

RDMA Networking

RDMA (Remote Direct Memory Access) is a novel communication mechanism. In the RDMA scheme, data can communicate directly with the network card, bypassing the CPU and the complex operating system, which significantly enhances data throughput while reducing latency. Previously, RDMA was mostly carried over the InfiniBand network. It is now progressively being ported to Ethernet. The current mainstream networking scheme is based on the RoCE v2 protocol to build a network that supports RDMA.

Also check-RDMA over Converged Ethernet Guide.

RDMA Networking

Frame Switch

Frame switch refers to a network switch specifically designed for handling frame-based protocols, such as Ethernet. However, due to issues such as limited scalability, high device power consumption, and extensive failure domains, this technique is only appropriate for small-scale AI computer cluster deployment.

New Generation of AIGC Network: DDC Technology

Due to the performance bottleneck of traditional Ethernet, conventional approaches still suffer from performance losses such as congestion and packet loss and have insufficient scalability. In response these the limitations, a novel solution known as DDC (Distributed Disaggregated Chassis) has emerged. DDC deconstructs the conventional frame switch, enhancing its scalability and enabling the network scale to be tailored to the size of the AI cluster.

DDC not only fulfills the network requirements for large-scale AI model training in terms of scale and bandwidth throughput, but also addresses other crucial aspects of network operation. However, network operation is not only about these two aspects, it also needs to be optimized in terms of latency, load balancing, management efficiency, and so on. To tackle these challenges, DDC incorporates the following technical strategies:

VOQ+Cell-Based Forwarding Technique Combats Packet Loss

In scenarios where the network experiences bursts of traffic, it can lead to slow processing at the receiver, resulting in congestion and packet loss. The DDC system employs the VOQ+Cell-based forwarding mechanism, which offers a robust solution. Let's delve into the specific process:

The sender classifies and stores packets into Virtual Output Queues (VOQs) after receiving them. The Network Connection Point (NCP) checks buffer capacity before segmenting and dynamically load-balancing. In temporary processing limitations, packets are stored within VOQs, enhancing communication stability and bandwidth utilization.

VOQ+Cell

PFC Single-Hop Deployment entirely Avoids Deadlock

RDMA lossless networks utilize PFC (Priority Flow Control) technology for traffic control, enabling the creation of multiple virtual channels for Ethernet links with assigned priorities. However, PFC implementation is not without its challenges, particularly with deadlock issues.

PFC Single-Hop Deployment

In the context of the DDC network, a distinctive advantage arises from considering all Network Connection Points (NCPs) and Network Communication Functions (NCFs) as cohesive entities, eliminating the need for multi-level switches. Consequently, the DDC architecture effectively circumvents the deadlock problem associated with PFC, ensuring seamless and uninterrupted network operations.

NCFs

Distributed OS Improves Reliability

Within the DDC architecture, the management function is centralized under the control of the Network Control Card (NCC). However, this centralized control presents the potential risk of a single point of failure. To mitigate this issue, DDC incorporates a distributed operating system, enabling individual management capabilities for each Network Connection Point (NCP) and Network Communication Function (NCF). This distributed approach includes independent control and management planes for enhanced system reliability and simplified deployment processes.

Conclusion

DDC effectively addresses the network demands associated with large-scale AI model training through its distinctive technical strategies. Moreover, it meticulously optimizes numerous aspects to ensure the network's stability and efficiency in diverse and intricate scenarios.