Variational Inference#

The generative model#

All spatialvi-tools models are based on amortized variational inference (AVI). The core idea is to learn a probabilistic generative model \(p_\theta(x \mid z)\) of observed gene expression \(x\) conditioned on a low-dimensional latent variable \(z\), alongside an approximate posterior \(q_\phi(z \mid x)\) parameterized by an encoder neural network.

Training maximizes the Evidence Lower Bound (ELBO):

\[\mathcal{L} = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x \mid z)] - \mathrm{KL}(q_\phi(z \mid x) \| p(z))\]

Gene likelihood#

Most models support two gene likelihood distributions:

  • Negative Binomial (NB): default; models overdispersed count data.

  • Poisson: simpler; suitable for very low-count spatial data.

Spatial priors (scVIVA, ResolVI)#

Spatial models extend the standard VAE with a niche-aware prior that conditions the latent distribution on the cellular neighbourhood, encoding microenvironment structure directly into the latent space.

KL annealing#

To stabilize early training, the KL divergence term is annealed from 0 to 1 over the first n_epochs_kl_warmup epochs. This prevents the model from collapsing to the prior before the encoder has learned a meaningful representation.

References#

  • Lopez et al. (2018) Deep generative modeling for single-cell transcriptomics. Nature Methods.

  • Levy et al. (2025) scVIVA. bioRxiv.