ICCV23 paper Householder Projector for Unsupervised Latent Semantics Discovery The extended version has been set to IEEE Transactions on Pattern Analysis and Machine Intelligence, and is currently under review.
(1) Illustration on how our Householder Projector represents the modulation weight A of StyleGANs. (2) The EG3D framework integrated with our
proposed projector. (3) Diffusion Autoencoder (DiffAE) integrated with our projector.
Some identified attributes in StyleGAN2/StyleGAN3.
Some identified attributes in DiffAE.
This paper proposes Householder Projector, a flexible and general low-rank orthogonal matrix representation based on Householder transformations, to parameterize the projection matrix of GANs and DIffusions. The orthogonality guarantees that the eigenvectors correspond to disentangled interpretable semantics, while the low-rank property encourages that each identified direction has meaningful variations. We integrate our projector into pre-trained StyleGAN2/StyleGAN3/DiffAE and evaluate the models on several benchmarks. Within marginally 1% of the original training steps for fine-tuning, our projector helps GANs and Diffusions to discover more disentangled and precise semantic attributes without sacrificing image fidelity.
conda env create -f householdergan.yml
conda activate householdergan
All datasets can be downloaded from the official website. For StyleGAN2 pre-processing, please check and run prepare_data.py. For StyleGAN3 pre-processing, please check and run datset_tool.py. For DiffAE pre-processing, please download the official LSUN Bedroom/Horse and FFHQ dataset and put the lmdb files under the diffae/datasets/
directory.
Training on FFHQ:
python -m torch.distributed.launch train_1024.py \
--nproc_per_node=4 --master_port=9032 \
train_1024.py --batch 8 [dataset_path] \
--ckpt [pretrained_model] --size 1024 --ortho_id -2 --iter 10000000 \
--checkpoints_dir [save_model_path] \
--sample_dir [save_sample_path] --loadd --training_FULL --diag_size 10 &
Test on FFHQ:
python closed_form_factorization.py --out [factor_path] [save_model_path] --is_ortho &
wait
python apply_factor.py --output_dir [save_results_path] \
--ckpt [save_model_path] \
--factor [factor_path] --ortho_id -2 --size 1024 &
wait
Evaluation (FID, PPL, PIPL)
python fid.py [save_model_path] \
--ortho_id -2 \
--inception inception_ffhq.pkl \
--size 1024
wait
python closed_form_factorization.py --out [factor_path] \
[save_model_path] --is_ortho --diag_size 10 \
wait
python ppl_sefa.py [save_model_path] \
--factor [factor_path] --ortho_id -2
--sampling full --eps 1.0 --size 1024 \
wait
python ppl.py [save_model_path] --ortho_id -2 --sampling full --size 1024 &
For hyper-parameters of each dataset like gamma, please refer to the original StyleGAN3 training configuration for details. Here we only show the training script on AFHQv2:
python train.py --outdir=[save_sample_path] --cfg=stylegan3-r \
--data=[dataset_path] \
--cfg=stylegan3-r --gpus=3 --batch-gpu=2 --batch=6 --gamma=16.4 --mbstd-group 2 \
--resume=[pretrained_model] \
--diag_size 10 --is_ortho True --snap 5
Test on AFHQv2:
python closed_form_factorization.py --out [factor_path] \
--resume_pkl [save_model_path] \
--is_ortho &
wait
python apply_factor.py --outdir=[save_results_path] --cfg=stylegan3-r \
--data=[dataset_path] \
--gpus=1 --batch-gpu=1 --batch=1 --gamma=16.4 --mbstd-group 2 \
--resume=[save_model_path] \
--diag_size 10 --is_ortho True --factor [factor_path]
wait
cd StyleGANHuman/training_scripts/sg3/
Training on SHHQv1:
python train.py --outdir=[save_results_path] --cfg=stylegan3-r --gpus=4 --batch=16 --gamma=12.4 --mbstd-group 4 \
--mirror=1 --aug=noaug --data=[dataset_path] --square=False --snap=5 \
--resume=[pretrained_model] --diag_size 10 --is_ortho True
Test on SHHQv1:
python closed_form_factorization.py --out [factor_path] \
--resume_pkl [save_model_path] \
--is_ortho &
wait
python apply_factor.py --outdir=[save_results_path] --cfg=stylegan3-r \
--data=[dataset_path] \
--gpus=1 --batch-gpu=1 --batch=1 --gamma=16.4 --mbstd-group 1 \
--resume=[save_model_path] \
--diag_size 10 --is_ortho True --factor [factor_path]
wait
cd diffae/
Training on FFHQ:
python run_ffhq128.py
Test on FFHQ:
python closed_form_factorization.py --out [factor_path] [checkpoint_path] --is_ortho &
wait
python apply_factor.py --output_dir [ouput_path] --ckpt [checkpoint_path] --factor [factor_path] --size 128
wait
Evaluation ppl/pipl:
python ppl.py --ckpt [checkpoint_path] --sampling full --eps 1e-1 --size 128 &
wait
python pipl.py --ckpt [checkpoint_path] --factor [factor_path] --sampling full --eps 1e-1 --size 128 &
wait
(don’t forget to set model is_ortho=True!)
We release the pre-trained StyleGANs and our fine-tuned models on different resolutions.
Datset | Backbone | Resolution | Fine-tuned Model | Pre-trained Model |
---|---|---|---|---|
FFHQ | StyleGAN2 | 256x256 | 🔗 | 🔗 |
FFHQ | StyleGAN2 | 1024x1024 | 🔗 | 🔗 |
LSUN Church | StyleGAN2 | 256x256 | 🔗 | 🔗 |
LSUN Cat | StyleGAN2 | 256x256 | 🔗 | 🔗 |
AFHQv2 | StyleGAN3 | 512x512 | 🔗 | 🔗 |
MetFaces | StyleGAN3 | 1024x1024 | 🔗 | 🔗 |
SHHQv1 | StyleGAN3 | 512x256 | 🔗 | 🔗 |
If you think the codes are helpful to your research, please consider citing our paper:
@inproceedings{song2023householder,
title={Householder Projector for Unsupervised Latent Semantics Discovery},
author={Song, Yue and Zhang, Jichao and Sebe, Nicu and Wang, Wei},
booktitle={ICCV},
year={2023}
}
If you have any questions or suggestions, please feel free to contact us
yue.song@unitn.it
or jichao.zhang@unitn.it
or chenyu.zhang@unitn.it