DeepSpeed Mixture-of-Quantization (MoQ)
DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...
DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ). MoQ is designed on top of QAT (Quantization...
The quickest way to get started with DeepSpeed is via pip, this will install the latest release of DeepSpeed which is not tied to specific PyTorch or CUDA ve...
Contents Introduction Example Script Launching OPT 13B Inference Performance Comparison Supported Models Unsupported Models
Automatically discover the optimal DeepSpeed configuration that delivers good training speed
This tutorial will help you get started with DeepSpeed on Azure.
Train your first model with DeepSpeed!
Log all DeepSpeed communication calls
Watch out! On 12/12/2022, we released DeepSpeed Data Efficiency Library which provides a more general curriculum learning support. This legacy curriculum lea...
What is DeepSpeed Data Efficiency: DeepSpeed Data Efficiency is a library purposely built to make better use of data, increases training efficiency, and impr...
Measure the parameters, latency, and floating-point operations of your model
Train your first GAN model with DeepSpeed!
First steps with DeepSpeed
DeepSpeed-Inference introduces several features to efficiently serve transformer-based PyTorch models. It supports model parallelism (MP) to fit large models...
This tutorial shows how to use to perform Learning Rate range tests in PyTorch.
If you haven’t already, we advise you to first read through the Getting Started guide before stepping through this tutorial.
DeepSpeed-MoE Inference introduces several important features on top of the inference optimization for dense models (DeepSpeed-Inference blog post). It embra...
In this tutorial, we introduce how to apply DeepSpeed Mixture of Experts (MoE) to NLG models, which reduces the training cost by 5 times and reduce the MoE m...
DeepSpeed v0.5 introduces new support for training Mixture of Experts (MoE) models. MoE models are an emerging class of sparsely activated models that have s...
What is DeepSpeed Compression: DeepSpeed Compression is a library purposely built to make it easy to compress models for researchers and practitioners while ...
Monitor your model’s training metrics live and log for future analysis
This tutorial shows how to implement 1Cycle schedules for learning rate and momentum in PyTorch.
Note: On 03/07/2022 we released 0/1 Adam, which is a new communication-efficient Adam optimizer partially following the 1-bit Adam’s design. Compared to the ...
Watch out! 1) The NCCL-based implementation requires PyTorch >= 1.8 (and NCCL >= 2.8.3 when you have 64 or more GPUs). See details below. 2) Although 1...
DeepSpeed v0.3 includes new support for pipeline parallelism! Pipeline parallelism improves both the memory and compute efficiency of deep learning training ...
In this tutorial, we are going to introduce the progressive layer dropping (PLD) in DeepSpeed and provide examples on how to use PLD. PLD allows to train Tra...
This tutorial describes how to use PyTorch Profiler with DeepSpeed.
In this tutorial we describe how to use DeepSpeed Sparse Attention (SA) and its building-block kernels. The easiest way to use SA is through DeepSpeed launch...
This tutorial shows how to enable the DeepSpeed transformer kernel and set its different configuration parameters.
ZeRO-3 Offload consists of a subset of features in our newly released ZeRO-Infinity. Read our ZeRO-Infinity blog to learn more!
Watch out! 1) The NCCL-based implementation requires PyTorch >= 1.8 (and NCCL >= 2.8.3 when you have 64 or more GPUs). See details below. 2) Although 0...