# ResolVI

**ResolVI** (Python class {class}`~spatialvi.ResolVI`) is a generative model of single-cell resolved spatial
transcriptomics that can subsequently be used for many common downstream tasks.

The advantages of ResolVI are:

-   Addresses noise and bias in ST data due to wrong segmentation, unspecific background and limited spatial resolution
-   Scalable to very large datasets (>1 million cells).

The limitations of ResolVI include:

-   Effectively requires a GPU for fast inference.
-   Latent space is not interpretable, unlike that of a linear method.
-   Assumes single cells are observed and do not work with low-resolution ST like Visium or Slide-Seq.

```{topic} Tutorials:

-   {doc}`/tutorials/resolVI_tutorial`
```

## Preliminaries

ResolVI takes as input spatially resolved RNA_seq count matrices downstream of cellular segmentation and molecule
assignments to cells. These counts can be either derived from sequencing spatially resolved molecules or fluorescent
imaging. ResolVI leverages the gene expression of neighboring cells and reassigns observed gene expression to neighboring
cells as well as an unspecific background.

ResolVI accepts as input the observed expression of the cell itself, its spatial neighbors and their gene expression
as well as the distance between these cells. Additionally, a vector of categorical covariates $S$, representing
batch, donor, etc., is an optional input to the model. ResolVI provides a semi-supervised mode, adjusting the prior in
the latent space for different cell types and training a classifier to predict cell types from latent embeddings.

## Generative process

ResolVI posits that the observed expression of cell $n$ in gene $g$, $x_{ng}$ is generated by the following process:

```{math}
:nowrap: true

\begin{align}
    z &\sim \mathrm{MixtureOfGaussians}(\mu_1, \dots, \mu_K, \Sigma_1, \dots, \Sigma_K) \\
    \alpha_n &\sim \mathrm{Dirichlet}(C) \\
    r_{ng} &\sim \mathrm{Exponential}(R) \\
    h_{ng} &=
    \mathrm{Gamma}(r_{ng}, \frac{r_{ng}}{\alpha_0 f_\theta(z, b) + \alpha_1 \sum\limits_{{N(n)}} \beta_{N(n)} f_\theta(z_{N(n)}, b)}) + \alpha_2 bg\\
    x_{ng} &\sim \mathrm{Poisson}(l_n h_{ng})
\end{align}
```

In particular, $z$ and $z_{N(n)}$ are the latent embeddings of the cell itself as well as its spatial neighbors
both of dimension $L$. ResolVI uses a mixture of Gaussians prior to $z$:

```{math}
:nowrap: true

\begin{align}
    c_n &\sim \textrm{Categorical}(
        \pi_1, \pi_2, \dots, \pi_K
    ), \\
    z_n \mid c_n = c &\sim \mathcal{N}(\mu_c, \sigma_c)
\end{align}
```

In brief, we assume that observed expression of gene $g$ for cell $n$ can be modelled as a sum over
the components of expression truly expressed by the cell $\alpha_0$, the expression explained by neighboring
cells $\alpha_1$ and wrongly assigned to $n$ and a component due to unspecific background $\alpha_2$.

The latent variables, along with their description, are summarized in the following table:

```{eval-rst}
.. list-table::
   :widths: 20 90 15
   :header-rows: 1

   * - Latent variable
     - Description
     - Code variable (if different)
   * - :math:`z_n \in \mathbb{R}^L`
     - Low-dimensional representation capturing the state of a cell
     - ``latent``
   * - :math:`\beta_{N(n)} \in \Delta^{N(n) - 1}`
     - Per-neighbor diffusion
     - ``per_neighbor_diffusion``
   * - :math:`\alpha_{n0 \dots 2} \in \Delta^{2}`
     - Per cell true, diffusion and background proportion
     - ``mixture_proportions``
   * - :math:`bg_{ng} \in \Delta^{G - 1}`
     - Per cell estimate of background
     - ``background``
   * - :math:`background_{s} \in \mathbb{R}^G`
     - Per sample background vector
     - ``per_gene_background``
   * - :math:`\rho_n \in \Delta^{G - 1}`
     - Per cell rate of expression
     - ``px_scale``
   * - :math:`\mu_n, \mu_{N(n)} \in \mathbb{R}^G`
     - Per cell estimated expression
     - ``px_rate and px_rate_n``
```


## Inference

ResolVI uses variational inference, specifically auto-encoding variational Bayes in Pyro to learn both the model parameters
(the neural network parameters, dispersion parameters, etc.) and an approximate posterior distribution.
We perform amortization using neural network for $z_n$ and $\alpha_n$, while $\beta_{N(n)n}$ is estimated
for each cell.

## Tasks

Here we provide an overview of some of the tasks that ResolVI can perform. Please see {class}`spatialvi.ResolVI`
for the full API reference.

### Dimensionality reduction

For dimensionality reduction, the mean of the approximate posterior $q_\phi(z_i \mid y_i, n_i)$ is returned by default.
This is achieved using the method:

```
>>> adata.obsm["X_resolvi"] = model.get_latent_representation()
```

Users may also return samples from this distribution, as opposed to the mean, by passing the argument `give_mean=False`.
The latent representation can be used to create a nearest neighbor graph with scanpy with:

```
>>> import scanpy as sc
>>> sc.pp.neighbors(adata, use_rep="X_resolvi")
>>> adata.obsp["distances"]
```

### Transfer learning

A ResolVI model can be pre-trained on reference data and updated with query data using {meth}`~spatialvi.ResolVI.load_query_data`, which then facilitates transfer of metadata like cell type annotations.

### Estimation of true expression levels

In {meth}`~spatialvi.ResolVI.get_normalized_expression` ResolVI returns the expected true expression value of $\rho_n$ under the approximate posterior. For one cell $n$, this can be written as:

```{math}
:nowrap: true

\begin{align}
   \mathbb{E}_{q_\phi(z_n \mid x_n)}\left[f_{\theta}\left(z_{n}, s_n \right) \right]
\end{align}
```

### Differential expression

Differential expression analysis is achieved with {meth}`~spatialvi.ResolVI.differential_expression`.
ResolVI tests differences in expression levels $\rho_{n} = f_{\theta}\left(z_n, s_n\right)$.

### Cell-type prediction

Prediction of cell-type labels is performed with {meth}`~spatialvi.ResolVI.predict`.
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.

### Differential niche abundance

Differential niche abundance analysis is achieved with {meth}`~spatialvi.ResolVI.differential_niche_abundance`.
A semisupervised model is necessary to perform this analysis as it leverages the cell-type classifier.

## Quick Start

```python
import spatialvi

# Setup and train
spatialvi.ResolVI.setup_anndata(adata, layer="counts", spatial_key="spatial")
model = spatialvi.ResolVI(adata)
model.train()

# Get corrected expression
corrected = model.get_normalized_expression(adata)

# Get latent representation
latent = model.get_latent_representation()
```

## Reference

> Ergen & Yosef (2025) — ResolVI - addressing noise and bias in spatial transcriptomics. *bioRxiv*. doi: 10.1101/2025.01.20.634005
