ResolVI#
ResolVI (Python class ResolVI) is a generative model of single-cell resolved spatial
transcriptomics that can subsequently be used for many common downstream tasks.
The advantages of ResolVI are:
Addresses noise and bias in ST data due to wrong segmentation, unspecific background and limited spatial resolution
Scalable to very large datasets (>1 million cells).
The limitations of ResolVI include:
Effectively requires a GPU for fast inference.
Latent space is not interpretable, unlike that of a linear method.
Assumes single cells are observed and do not work with low-resolution ST like Visium or Slide-Seq.
Preliminaries#
ResolVI takes as input spatially resolved RNA_seq count matrices downstream of cellular segmentation and molecule assignments to cells. These counts can be either derived from sequencing spatially resolved molecules or fluorescent imaging. ResolVI leverages the gene expression of neighboring cells and reassigns observed gene expression to neighboring cells as well as an unspecific background.
ResolVI accepts as input the observed expression of the cell itself, its spatial neighbors and their gene expression as well as the distance between these cells. Additionally, a vector of categorical covariates \(S\), representing batch, donor, etc., is an optional input to the model. ResolVI provides a semi-supervised mode, adjusting the prior in the latent space for different cell types and training a classifier to predict cell types from latent embeddings.
Generative process#
ResolVI posits that the observed expression of cell \(n\) in gene \(g\), \(x_{ng}\) is generated by the following process:
In particular, \(z\) and \(z_{N(n)}\) are the latent embeddings of the cell itself as well as its spatial neighbors both of dimension \(L\). ResolVI uses a mixture of Gaussians prior to \(z\):
In brief, we assume that observed expression of gene \(g\) for cell \(n\) can be modelled as a sum over the components of expression truly expressed by the cell \(\alpha_0\), the expression explained by neighboring cells \(\alpha_1\) and wrongly assigned to \(n\) and a component due to unspecific background \(\alpha_2\).
The latent variables, along with their description, are summarized in the following table:
Latent variable |
Description |
Code variable (if different) |
|---|---|---|
\(z_n \in \mathbb{R}^L\) |
Low-dimensional representation capturing the state of a cell |
|
\(\beta_{N(n)} \in \Delta^{N(n) - 1}\) |
Per-neighbor diffusion |
|
\(\alpha_{n0 \dots 2} \in \Delta^{2}\) |
Per cell true, diffusion and background proportion |
|
\(bg_{ng} \in \Delta^{G - 1}\) |
Per cell estimate of background |
|
\(background_{s} \in \mathbb{R}^G\) |
Per sample background vector |
|
\(\rho_n \in \Delta^{G - 1}\) |
Per cell rate of expression |
|
\(\mu_n, \mu_{N(n)} \in \mathbb{R}^G\) |
Per cell estimated expression |
|
Inference#
ResolVI uses variational inference, specifically auto-encoding variational Bayes in Pyro to learn both the model parameters (the neural network parameters, dispersion parameters, etc.) and an approximate posterior distribution. We perform amortization using neural network for \(z_n\) and \(\alpha_n\), while \(\beta_{N(n)n}\) is estimated for each cell.
Tasks#
Here we provide an overview of some of the tasks that ResolVI can perform. Please see spatialvi.ResolVI
for the full API reference.
Dimensionality reduction#
For dimensionality reduction, the mean of the approximate posterior \(q_\phi(z_i \mid y_i, n_i)\) is returned by default. This is achieved using the method:
>>> adata.obsm["X_resolvi"] = model.get_latent_representation()
Users may also return samples from this distribution, as opposed to the mean, by passing the argument give_mean=False.
The latent representation can be used to create a nearest neighbor graph with scanpy with:
>>> import scanpy as sc
>>> sc.pp.neighbors(adata, use_rep="X_resolvi")
>>> adata.obsp["distances"]
Transfer learning#
A ResolVI model can be pre-trained on reference data and updated with query data using load_query_data(), which then facilitates transfer of metadata like cell type annotations.
Estimation of true expression levels#
In get_normalized_expression() ResolVI returns the expected true expression value of \(\rho_n\) under the approximate posterior. For one cell \(n\), this can be written as:
Differential expression#
Differential expression analysis is achieved with differential_expression().
ResolVI tests differences in expression levels \(\rho_{n} = f_{\theta}\left(z_n, s_n\right)\).
Cell-type prediction#
Prediction of cell-type labels is performed with predict().
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.
Differential niche abundance#
Differential niche abundance analysis is achieved with differential_niche_abundance().
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.
Quick Start#
import spatialvi
# Setup and train
spatialvi.ResolVI.setup_anndata(adata, layer="counts", spatial_key="spatial")
model = spatialvi.ResolVI(adata)
model.train()
# Get corrected expression
corrected = model.get_normalized_expression(adata)
# Get latent representation
latent = model.get_latent_representation()
Reference#
Ergen & Yosef (2025) — ResolVI - addressing noise and bias in spatial transcriptomics. bioRxiv. doi: 10.1101/2025.01.20.634005