TY - GEN
T1 - SIMPLE REFLOW
T2 - 13th International Conference on Learning Representations, ICLR 2025
AU - Kim, Beomsu
AU - Hsieh, Yu Guan
AU - Klein, Michal
AU - Cuturi, Marco
AU - Ye, Jong Chul
AU - Kawar, Bahjat
AU - Thornton, James
N1 - Publisher Copyright:
© 2025 13th International Conference on Learning Representations, ICLR 2025. All rights reserved.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10 32 × 32, AFHQv2 64 × 64, and FFHQ 64 × 64. Combining all our techniques, we achieve state-of-the-art FID scores (without/with guidance, resp.) for fast generation via neural ODEs: 2.23/1.98 on CIFAR10, 2.30/1.91 on AFHQv2, 2.84/2.67 on FFHQ, and 3.49/1.74 on ImageNet-64, all with merely 9 neural function evaluations.
AB - Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to time-critical tasks. The ReFlow procedure can accelerate sampling by straightening generation trajectories. However, ReFlow is an iterative procedure, typically requiring training on simulated data, and results in reduced sample quality. To mitigate sample deterioration, we examine the design space of ReFlow and highlight potential pitfalls in prior heuristic practices. We then propose seven improvements for training dynamics, learning and inference, which are verified with thorough ablation studies on CIFAR10 32 × 32, AFHQv2 64 × 64, and FFHQ 64 × 64. Combining all our techniques, we achieve state-of-the-art FID scores (without/with guidance, resp.) for fast generation via neural ODEs: 2.23/1.98 on CIFAR10, 2.30/1.91 on AFHQv2, 2.84/2.67 on FFHQ, and 3.49/1.74 on ImageNet-64, all with merely 9 neural function evaluations.
UR - https://www.scopus.com/pages/publications/105010232884
M3 - Conference contribution
AN - SCOPUS:105010232884
T3 - 13th International Conference on Learning Representations, ICLR 2025
SP - 36219
EP - 36245
BT - 13th International Conference on Learning Representations, ICLR 2025
PB - International Conference on Learning Representations, ICLR
Y2 - 24 April 2025 through 28 April 2025
ER -