Writing agnostic distributed code that supports different platforms, hardware configurations (GPUs, TPUs) and communication frameworks is tedious. In this blog, we will discuss how PyTorch-Ignite solves this problem with minimal code change.
This blog assumes you have some knowledge about:
Thus, you will now be able to run the same version of the code across all supported backends seamlessly:
Horovod framework with
XLA on TPUs via pytorch/xla
In this blog post we will compare PyTorch-Ignite's API with torch
native's distributed code and highlight the differences and ease of use
of the former. We will also show how Ignite's
automatically make your code compatible with the aforementioned
distributed backends so that you only have to bring your own model,
optimizer and data loader objects.
Code snippets, as well as commands for running all the scripts, are provided in a separate repository.
Then we will also cover several ways of spawning processes via torch
torch.multiprocessing.spawn and also via multiple distributed
launchers in order to highlight how Pytorch-Ignite's
handle it without any changes to the code, in particular:
More information on launchers experiments can be found here.
We need to write different code for different distributed backends. This
can be tedious especially if you would like to run your code on
different hardware configurations. Pytorch-Ignite's
idist will do
all the work for you, owing to the high-level helper methods.
This method adapts the logic for non-distributed and available distributed configurations. Here are the equivalent code snippets for distributed model instantiation:
Additionally, it is also compatible with NVIDIA/apex
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level) model = idist.auto_model(model)
and Torch native AMP
This method adapts the optimizer logic for non-distributed and available distributed configurations seamlessly. Here are the equivalent code snippets for distributed optimizer instantiation:
This method adapts the data loading logic for non-distributed and available distributed configurations seamlessly on target devices.
auto_dataloader() automatically scales the batch size
according to the distributed configuration context resulting in a
general way of loading sample batches on multiple devices.
Here are the equivalent code snippets for the distributed data loading step:
idist provides collective operations like
broadcast that can be used
with all supported distributed frameworks. Please, see our
for more details.
The code snippets below highlight the API's specificities of each of the
distributed backends on the same use case as compared to the
API. PyTorch native code is available for DDP, Horovod, and for XLA/TPU
PyTorch-Ignite's unified code snippet can be run with the standard PyTorch
nccl and also with Horovod and XLA for
TPU devices. Note that the code is less verbose, however, the user still
has full control of the training loop.
The following examples are introductory. For a more robust, production-grade example that uses PyTorch-Ignite, refer here.
The complete source code of these experiments can be found here.
idistalso unifies the distributed codes launching method and makes the distributed configuration setup easier with the ignite.distributed.launcher.Parallel (idist Parallel) context manager.
nproc_per_node(passed as a script argument) child processes and initialize a processing group according to the provided backend or use tools like
horovodrunby initializing the processing group given the
backendargument only in a general way.
In this case
idist Parallel is using the native torch
torch.multiprocessing.spawn method under the hood in order to run
the distributed configuration. Here
nproc_per_node is passed as a
Running multiple distributed configurations with one code. Source: ignite_idist.py:
# Running with gloo python -u ignite_idist.py --nproc_per_node 2 --backend gloo # Running with nccl python -u ignite_idist.py --nproc_per_node 2 --backend nccl # Running with horovod with gloo controller ( gloo or nccl support ) python -u ignite_idist.py --backend horovod --nproc_per_node 2 # Running on xla/tpu python -u ignite_idist.py --backend xla-tpu --nproc_per_node 8 --batch_size 32
idist Parallel context manager is also compatible
with multiple distributed launchers.
Here we are using the
torch.distributed.launch script in order to
spawn the processes:
In order to run this example and to avoid the installation procedure, you can pull one of PyTorch-Ignite's docker image with pre-installed Horovod. It will include Horovod with
gloo controller and
The same result can be achieved by using
slurm without any
modification to the code:
srun --nodes=2 --ntasks-per-node=2 --job-name=pytorch-ignite --time=00:01:00 --partition=gpgpu --gres=gpu:2 --mem=10G python ignite_idist.py --backend nccl
sbatch script.bash with the script file
As we saw through the above examples, managing multiple configurations and specifications for distributed computing has never been easier. In just a few lines we can parallelize and execute code wherever it is while maintaining control and simplicity.
PyTorch-Ignite is currently maintained by a team of volunteers and we are looking for more contributors. See CONTRIBUTING.md for how you can contribute.