TL;DR This paper generalizes and extends the stochastic interpolant formalism to solve a very large variety of tasks with one single trained model. It provides zero-shot versatility; experimental validations include exact inpainting, editing, deblurring, constrained planning, as well as exact posterior sampling on high-dimensional datasets.
We propose a framework for learning maps between probability distributions that broadly generalizes the time dynamics of flow and diffusion models.
To enable this, we generalize stochastic interpolants by replacing the scalar time variable with vectors, matrices, or linear operators, allowing us to bridge probability distributions across multiple dimensional spaces.
This approach enables the construction of versatile generative models capable of fulfilling multiple tasks without task-specific training. Our operator-based interpolants not only provide a unifying theoretical perspective for existing generative models but also extend their capabilities. Through numerical experiments, we demonstrate the zero-shot efficacy of our method on conditional generation and inpainting, fine-tuning and posterior sampling and multiscale modeling, suggesting its potential as a generic task-agnostic alternative to specialized models.
Generalized stochastic interpolants: We replace the scalar time with linear operators to interpolate between distributions across dimensions. This points toward a more general paradigm of universal generative models that can be trained once and then applied to a variety of objectives.
Unified Generative framework: We provide a single formalism for multitask generation with flows and diffusions, without any sort of retraining.
zero-shot versatility: Perform conditional generation, inpainting, fine-tuning, and posterior sampling with one single model. You can impose hard constraints on the data generation, or make multiple generation in a row, one after another.
Universal generative models: Amortize training across tasks for scalable, task-agnostic generation. Inpainting and conditional generation can be performed during the same diffusion, saving time and computing resources.
Consider images from high-dimensional image datasets CelebA and AFHQ-Cat. With multitask stochastic interpolants, pixel-wise inpainting becomes a trivial, one-shot task. Here, we consider block and random masking, probing two radically different inpainting situation.
Original image
Block masking
Random masking
Original image
Block masking
Random masking
PSNR and SSIM metrics on CelebA and AFHQ-Cat datasets. A single generation was performed on each image. The best result of each category is highlighted in bold font.
| Method | CelebA | AFHQ-Cat | ||||||
|---|---|---|---|---|---|---|---|---|
| Random | Block | Random | Block | |||||
| PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | |
| Degraded | 11.82 | 0.197 | 22.12 | 0.742 | 13.35 | 0.234 | 21.50 | 0.744 |
| pokle2024trainingfree | 28.36 | 0.865 | 28.84 | 0.914 | 28.84 | 0.838 | 23.88 | 0.874 |
| hamu2024dflow | 33.07 | 0.938 | 29.70 | 0.893 | 31.37 | 0.888 | 26.69 | 0.833 |
| zhang2024flow | 32.33 | 0.945 | 29.40 | 0.858 | 31.76 | 0.909 | 25.85 | 0.822 |
| martin2024pnp | 33.54 | 0.953 | 30.59 | 0.943 | 32.98 | 0.930 | 26.87 | 0.904 |
| Ours | 33.76 | 0.967 | 29.98 | 0.938 | 33.11 | 0.945 | 26.96 | 0.914 |
Of all the inpainting benchmarks, multitask stochastic interpolants perform best while offering great flexibility, both on PSNR and SSIM metrics. The very same model is employed for both block and random inpainting.
The eyes and mouth are masked then generated in turn. Multiple generations are performed in a row on the same image. The final result does not deviate significantly from the original image, see below.
Original image
Masked eyes
Generated eyes
Generated eyes + Masked mouth
Generated eyes + Generated mouth
Original image
Masked eyes
Generated eyes
Generated eyes + Masked mouth
Generated eyes + Generated mouth
This problem takes root in Reinforcement Learning (RL). Take a maze and two end points. The task is to find the shortest path joining them while abiding by the contraints imposed in the environment. With stochastic interpolants, to generate a path, you simply fix the first and last point, and eventually let diffuse all other intermediates points.
The constraints do not necessarily have to apply only on the end points, you can also impose hard constraints on the pathway, assuming it remains feasible. A small white dot represents the constraints applied on the path, forcing it to take a detour. The path length adapts accordingly.
Shortest path, no constraint except end-points.
Path under a constraint at mid-length, here on the bottom-right corner.
With the very same model, you can perform exact posterior sampling. Take the $\varphi^4$ model. Consider the system with energy $E$ under a null magnetic field and assume a field configuration can be sampled with coefficients $\alpha$ and $\beta$. It yields the prior distribution. Now, consider the same system under a magnetic field $h$, its energy becomes $E_r=E + (h, \varphi)$. By simply shifting the field configuration during generation process: $\varphi_r=\varphi + \frac{\alpha^2}{\beta} h$, you can sample from the posterior without retraining or any kind of approximation. The training data was generated with Hamiltonian MCMC. The figure below shows that stochastic interpolants generate data that reproduce the magnetisation statistic $M(\varphi) = \frac1N \sum\limits_{a} \varphi(a)$ for $h = 0$ and $h \ne 0$ with a single model.
A major limitation of HMC methods in very high dimension is the low acceptation rate of sampling a new configuration after reaching the stationary distribution. Consequently, generating a whole dataset of $\varphi^4$ configuration is therefore time-consuming. With our method, sampling a configuration close to an other is trivial, you can just mask out a subset of spins, and generate them again immediately, as illustrated below!
New operator interpolant
$$ \begin{align*} I_{\alpha,\beta}(x_0, x_1) = \alpha x_0 + \beta x_1,
\end{align*} $$ where $x_0 \sim$ noise, $x_1\sim$ data distribution
and $\alpha,\beta$ are linear operator.
Associated with this interpolant we introduce:
Denoiser drifts:
$$ \begin{align*} \eta_0(\alpha,\beta,x) = \mathop{\mathrm{arg
min}}\limits_{\hat \eta_0} \mathbb{E}_{x_0,x_1,\alpha,\beta}\big[|\hat
\eta_0(\alpha,\beta,I_{\alpha,\beta})- x_0|^2\big], \\
\eta_1(\alpha,\beta,x) = \mathop{\mathrm{arg min}}\limits_{\hat
\eta_1} \mathbb{E}_{x_0,x_1,\alpha,\beta}\big[|\hat
\eta_1(\alpha,\beta,I_{\alpha,\beta})- x_1|^2] \end{align*} $$
These drifts can be used to perform inference along any path in $(\alpha,\beta)$-space using:
Probability flow ODE
$$ \begin{align*} \dot x_t = \dot \alpha_t
\eta_0(\alpha_t,\beta_t,x_t) + \dot \beta_t
\eta_1(\alpha_t,\beta_t,x_t), \qquad x_0 \overset{d}{=}
I_{\alpha_0,\beta_0} \end{align*} $$
SDE
$$ \begin{align*} dx_t = (\dot \alpha_t -\sigma_t^{2}
\alpha_t^{-1})\eta_0(\alpha_t,\beta_t,x_t) dt + \dot \beta_t
\eta_1(\alpha_t,\beta_t,x_t) dt + \sqrt{2} \sigma_t dW_t, \qquad x_0
\overset{d}{=} I_{\alpha_0,\beta_0}\end{align*} $$
@article{negrel2025multitasklearningstochasticinterpolants,
author = {Negrel, Hugo and Coeurdoux, Florentin and Albergo, Michael and Vanden-Eijnden, Eric},
title = {Multitask Learning with Stochastic Interpolants},
journal = {NeurIPS2025},
year = {2025},
url = {https://arxiv.org/abs/2508.04605}
}