flocoder

This is a (Work In Progress!) teaching and research package for exploring latent generative flow matching models. (The name is inspired by "vocoder.")

This project initially started as a way to provide a lightweight, fast (and interpretable?) upgrade to the diffusion model system Pictures of MIDI for MIDI piano roll images, but flocoder is intended to work on more general datasets too.

Quickstart

Head over to notebooks/SD_Flower_Flow.ipynb and run through it for a taste. It will run on Colab.

Overview

Check out the sets of slides linked to on notebooks/README.md.

Architecture Overview

The above diagram illustrates the architecture of our intended model: a VQVAE compresses MIDI data into a discrete latent space, while a flow model learns to generate new samples in the continuous latent space.

Though we can also flow in the continuous space of a VAE like the one for Stable Diffusion, which may be easier for starters.

Installation

# Clone the repository
git clone https://github.com/drscotthawley/flocoder.git
cd flocoder

# Install uv if not already installed
# On macOS/Linux:
# curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows PowerShell:
# irm https://astral.sh/uv/install.ps1 | iex

# Create a virtual environment with uv, specifying Python 3.10
uv venv --python=python3.10

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

# Install the package in editable mode (See below if you get NATTEN errors!)
uv pip install -e .

# Recommended: Install development dependencies (jupyter, others...)
uv pip install -e ".[dev]"

# Recommended: install NATTEN separately with special flags
uv pip install natten --no-build-isolation
# if that fails, see NATTEN's install instructions (https://github.com/SHI-Labs/NATTEN/blob/main/docs/install.md)
# and specify exact version number, e.g.
# uv pip install natten==0.17.5+torch260cu126 -f https://shi-labs.com/natten/wheels/
# or build fromt the top of the source, e.g.:
# uv pip install --no-build-isolation git+https://github.com/SHI-Labs/NATTEN

Project Structure

The project is organized as follows:

flocoder/: Main package code
scripts/: Training and evaluation scripts
configs/: Configuration files for models and training
notebooks/: Jupyter notebooks for tutorials and examples
tests/: Unit tests

Training

The package includes multiple training scripts located in main directory.

You can skip the autoencoder/"codec" training if you'd rather use the pretrained Stable Diffusion VAE, e.g. for what follows:

export CONFIG_FILE=flowers_sd.yaml

Optional: Training a VQGAN

You can use use the Stable Diffusion VAE to get started quickly. (It will auto-download). But if you want to train your own...

export CONFIG_FILE=flowers_vqgan.yaml 
#export CONFIG_FILE=midi.yaml 
./train_vqgan.py --config-name $CONFIG_FILE

The autoencoder AKA "codec" (e.g. VQGAN) compresses roll images into a quantized latent representation. This will save checkpoints in the checkpoints/ directory. Use that checkpoint to pre-encode your data like so...

Pre-Encoding Data (with frozen augmentations)

Takes about 20 minutes to run on a single GPU.

./preencode_data.py --config-name $CONFIG_FILE

Training the Flow Model

./train_flow.py --config-name $CONFIG_FILE

The flow model operates in the latent space created by the autoencoder.

Generating Samples

# Generate new MIDI samples
./generate_samples.py --config-name $CONFIG_FILE
# or with optional gradio UI:
#./generate_samples.py --config-name $CONFIG_FILE +use_gradio=true

This generates new samples by sampling from the flow model and decoding through the VQVAE.

Contributing

Contributions are VERY welcome! See Contributing.md. Thanks in advance.

Discussions

Discussions are open! Rather than starting some ad-hoc Discord server, let's share ideas, questions, insights, etc. using the Discussions tab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

flocoder

Quickstart

Overview

Architecture Overview

Installation

Project Structure

Training

Optional: Training a VQGAN

Pre-Encoding Data (with frozen augmentations)

Training the Flow Model

Generating Samples

Contributing

Discussions

TODO

Acknowledgement

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
configs		configs
flocoder		flocoder
images		images
legacy		legacy
notebooks		notebooks
tests		tests
.gitignore		.gitignore
Contributing.md		Contributing.md
LICENSE		LICENSE
README.md		README.md
StyleGuide.md		StyleGuide.md
evaluate_model.py		evaluate_model.py
generate_samples.py		generate_samples.py
preencode_data.py		preencode_data.py
pyproject.toml		pyproject.toml
train_flow.py		train_flow.py
train_vqgan.py		train_vqgan.py

License

drscotthawley/flocoder

Folders and files

Latest commit

History

Repository files navigation

flocoder

Quickstart

Overview

Architecture Overview

Installation

Project Structure

Training

Optional: Training a VQGAN

Pre-Encoding Data (with frozen augmentations)

Training the Flow Model

Generating Samples

Contributing

Discussions

TODO

Acknowledgement

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages