UCCL Logo
About Us Blog Posts Sort by Tags
UCCL-EP
MoE DeepEP RDMA Expert Parallelism AMD EFA 2026-03-28

UCCL-EP: Portable Expert-Parallel Communication — Full Results

Full evaluation of UCCL-EP across NVIDIA and AMD GPUs, AWS EFA, InfiniBand, and Broadcom NICs — with application-level results on SGLang inference and Megatron-LM training.

UCCL-EP
MoE DeepEP IBGDA RDMA 2025-10-27

Previewing UCCL-EP: Flexible and Efficient Expert Parallelism for Cloud and Beyond

GPU-driven communication (e.g., DeepEP) is the key to efficient and large-scale EP, but it cannot run on heterogeneous platforms in the public cloud due to tight coupling between GPU and NIC.

KV transfer engine
NIXL NCCL RCCL Mooncake RDMA 2025-08-13

Everything You Want to Know about KV Cache Transfer Engine

There have been many KV cache transfer engines for PD disaggregation, but nearly no benchmarks on their performance. This blog serves for this purpose---benchmarking and analyzing the performance of v...

About
NCCL RCCL RDMA 2025-06-30

How to Debug NCCL Performance Issues for ML Workloads?

NCCL is notoriously hard to debug. In this post, we will go through our journey of debugging NCCL performance issues and how UCCL can help this process.

About
Networking AI RDMA 2025-05-26

UCCL-Tran: An Extensible Software Transport Layer for GPU Networking

UCCL-Tran is designed to be fast and extensible to meet the challenging requirements of modern ML/LLM workloads

About
Networking AI Sky Computing 2025-05-26

About Us

About UCCL team

  • 1
UCCL © 2026
UC Berkeley Sky Lab UC Davis ArtSy Lab