A GPipe implementation in PyTorch
-
Updated
Jul 25, 2024 - Python
A GPipe implementation in PyTorch
An I/O benchmark for deep Learning applications
Very-Low Overhead Checkpointing System
Extending DOLFINx with checkpointing functionality
Keras wrapper that autosaves what ModelCheckpoint cannot.
This FLINK project will consume streams from an azure event-hub and produce to a different event-hub ,and the config files for deploying the same in kubernetes
Code and tutorial on integrating wandb sweeps with Slurm pre-emption
A lightweight checkpointing program written in C.
This is a standalone flink producer using for testing the flink-consume-produce-ek repo contents
DMTCP scripts to get Python scripts working with SLURM.
A shared library to help test your code with failure-injection
A python package for performing memory intensive computations in parallel using chunks and checkpointing.
Robust distributed checkpointing and job management system for multi-GPU SLURM workloads
Hangman Game Word Predictor (Character-level attention)
A high-performance command-line tool written in Rust to validate BIP39 mnemonic phrases.
Koo and Toueg’s checkpointing and recovery protocol
Add a description, image, and links to the checkpointing topic page so that developers can more easily learn about it.
To associate your repository with the checkpointing topic, visit your repo's landing page and select "manage topics."