Abstract
Multi-omic single-cell datasets, in which multiple molecular modalities are profiled within the same cell, offer an opportunity to understand the temporal relationship between epigenome and transcriptome. To realize this potential, we developed MultiVelo, a differential equation model of gene expression that extends the RNA velocity framework to incorporate epigenomic data. MultiVelo uses a probabilistic latent variable model to estimate the switch time and rate parameters of chromatin accessibility and gene expression and improves the accuracy of cell fate prediction compared to velocity estimates from RNA only. Application to multi-omic single-cell datasets from brain, skin and blood cells reveals two distinct classes of genes distinguished by whether chromatin closes before or after transcription ceases. We also find four types of cell states: two states in which epigenome and transcriptome are coupled and two distinct decoupled states. Finally, we identify time lags between transcription factor expression and binding site accessibility and between disease-associated SNP accessibility and expression of the linked genes. MultiVelo is available on PyPI, Bioconda and GitHub (https://github.com/welch-lab/MultiVelo).
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
10x embryonic mouse brain dataset can be accessed at the 10x website at https://www.10xgenomics.com/resources/datasets/fresh-embryonic-e-18-mouse-brain-5-k-1-standard-1-0-0.
SHARE-seq9 mouse skin dataset can be found at the GEO (GSE140203).
Human brain multi-ome dataset43 can be found at the GEO (GSE162170) and the authors’ GitHub page.
ChIP-seq peaks for bulk CD34+ HSPC69 were downloaded from the GEO (GSE70677).
The processed files of the newly sequenced 10x Multiome HSPC samples are available at the GEO (GSE209878). Raw sequences were uploaded to dbGaP phs002915.v1.p1 under restricted access due to patient privacy concerns. Source data are provided with this paper.
Code availability
MultiVelo is implemented in Python. The package is available on GitHub (https://github.com/welch-lab/MultiVelo), PyPI and Bioconda.
References
Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).
Street, K. et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genom. 19, 477 (2018).
Ji, Z. & Ji, H. Tscan: pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res. 44, 117 (2016).
Setty, M. et al. Characterization of cell fate probabilities in single-cell data with Palantir. Nat. Biotechnol. 37, 451–460 (2019).
Welch, J. D., Hartemink, A. J. & Prins, J. F. SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data. Genome Biol. 17, 106 (2016).
La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Bergen, V., Lange, M., Peidli, S., Wolf, F. A. & Theis, F. J. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat. Biotechnol. 38, 1408–1414 (2020).
Gorin, G., Svensson, V. & Pachter, L. Protein velocity and acceleration from single-cell multiomics experiments. Genome Biol. 21, 39 (2020).
Ma, S. et al. Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183, 1103–1116 (2020).
Tedesco, M. et al. Chromatin velocity reveals epigenetic dynamics by single-cell profiling of heterochromatin and euchromatin.Nat. Biotechnol. 40, 235–244 (2021).
Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat. Biotechnol. 37, 1452–1457 (2019).
Weake, V. M. & Workman, J. L. Inducible gene expression: diverse regulatory mechanisms. Nat. Rev. Genet. 11, 426–437 (2010).
Rada-Iglesias, A. et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279–283 (2011).
Ostuni, R. et al. Latent enhancers activated by stimulation in differentiated cells. Cell 152, 157–171 (2013).
Lara-Astiaso, D. et al. Chromatin state dynamics during blood formation. Science 345, 943–949 (2014).
Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory epigenome. Nat. Rev. Genet. 20, 207–220 (2019).
Merkle, F. T., Tramontin, A. D., García-Verdugo, J. M. & Alvarez-Buylla, A. Radial glia give rise to adult neural stem cells in the subventricular zone. Proc. Natl. Acad. Sci. U.S.A. 101, 17528–17532 (2004).
Hansen, D. V., Lui, J. H., Parker, P. R. L. & Kriegstein, A. R. Neurogenic radial glia in the outer subventricular zone of human neocortex. Nature 464, 554–561 (2010).
Pollen, A. A. et al. Molecular identity of human outer radial glia during cortical development. Cell 163, 55–67 (2015).
Nadarajah, B. & Parnavelas, J. G. Modes of neuronal migration in the developing cerebral cortex. Nat. Rev. Neurosci. 3, 423–432 (2002).
Englund, C. et al. Pax6, Tbr2, and Tbr1 are expressed sequentially by radial glia, intermediate progenitor cells, and postmitotic neurons in developing neocortex. J. Neurosci. 25, 247–251 (2005).
Mayer, S. et al. Multimodal single-cell analysis reveals physiological maturation in the developing human neocortex. Neuron 102, 143–158 (2019).
Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352, 189–196 (2016).
Arnold, S. J. et al. The T-box transcription factor Eomes/Tbr2 regulates neurogenesis in the cortical subventricular zone. Genes Dev. 22, 2479–2484 (2008).
Vasistha, N. A. et al. Cortical and clonal contribution of Tbr2 expressing progenitors in the developing mouse brain. Cereb. Cortex 25, 3290–3302 (2014).
McEvilly, R. J., de Diaz, M. O., Schonemann, M. D., Hooshmand, F. & Rosenfeld, M. G. Transcriptional regulation of cortical neuron migration by POU domain factors. Science 295, 1528–1532 (2002).
Zahr, S. K. et al. A translational repression complex in developing mammalian neural stem cells that regulates neuronal specification. Neuron 97, 520–537 (2018).
Cusanovich, D. A. et al. A single-cell atlas of in Vivo mammalian chromatin accessibility. Cell 174, 1309–1324 (2018).
Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).
Ranzoni, A. M. et al. Integrative single-cell RNA-seq and ATAC-seq analysis of human developmental hematopoiesis. Cell Stem Cell 28, 472–487.e7 (2021).
Jia, G. et al. Single cell RNA-seq and ATAC-seq analysis of cardiac progenitor cell transition states and lineage settlement. Nat. Commun. 9, 4877 (2018).
Zhang, B. & Hsu, Y.-C. Emerging roles of transit-amplifying cells in tissue regeneration and cancer. Wiley. Interdiscip. Rev. Dev. Biol. 6, e282 (2017).
Coifman, R. R. et al. Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. U.S.A. 102, 7426–7431 (2005).
Millar, S. E. et al. WNT signaling in the control of hair growth and structure. Dev. Biol. 207, 133–149 (1999).
Berndt, D. J. & Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, 359–370 (AAAI Press, 1994).
Buenrostro, J. D. et al. Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation. Cell 173, 1535–1548 (2018).
Orkin, S. H. & Zon, L. I. Hematopoiesis: an evolving paradigm for stem cell biology. Cell 132, 631–644 (2008).
Görgens, A. et al. Revision of the human hematopoietic tree: granulocyte subtypes derive from distinct hematopoietic lineages. Cell Rep. 3, 1539–1552 (2013).
Laurenti, E. & Göttgens, B. From haematopoietic stem cells to complex differentiation landscapes. Nature 553, 418–426 (2018).
Pellin, D. et al. A comprehensive single cell transcriptional landscape of human hematopoietic progenitors. Nat. Commun. 10, 2395 (2019).
Bergen, V., Soldatov, R. A., Peter V, K. & Theis, F. J. RNA velocity—current challenges and future perspectives. Mol. Syst. Biol. 17, e10282 (2021).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
Trevino, A. E. et al. Chromatin and gene-regulatory dynamics of the developing human cerebral cortex at single-cell resolution. Cell 184, 5053–5069.e23 (2021).
Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat. Methods 14, 975–978 (2017).
Tong, A., Huang, J., Wolf, G., van Dijk, D. & Krishnaswamy, S. Trajectorynet: a dynamic optimal transport network for modeling cellular dynamics. In Proceedings of the 37th International Conference on Machine Learning, ICML’20 (JMLR.org, 2020).
Schiebinger, G. et al. Optimal-transport analysis of single-cell gene expression identifies developmental trajectories in reprogramming. Cell 176, 928–943 (2019).
Qiao, C. & Huang, Y. Representation learning of RNA velocity reveals robust cell transitions. Proc. Natl. Acad. Sci. U.S.A. 118, e2105859118 (2021).
Nelder, J. A. & Mead, R. A simplex method for function minimization. Comput. J. 7, 308–313 (1965).
Bentley, J. L. Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975).
Wilks, S. S. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9, 60–62 (1938).
Kim, J., Sheu, K. M., Cheng, Q. J., Hoffmann, A. & Enciso, G. Stochastic models of nucleosome dynamics reveal regulatory rules of stimulus-induced epigenome remodeling. Cell Rep. 40, 111076 (2022).
Larsson, A. J. M. et al. Genomic encoding of transcriptional burst kinetics. Nature 565, 251–254 (2019).
Giorgino, T. Computing and visualizing dynamic time warping alignments in R: the dtw package. J. Stat. Softw. 31, 1–24 (2009).
Tormene, P., Giorgino, T., Quaglini, S. & Stefanelli, M. Matching incomplete time series with dynamic time warping: an algorithm and an application to post-stroke rehabilitation. Artif. Intell. Med. 45, 11–34 (2009).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Batiuk, M. Y. et al. Identification of region-specific astrocyte subtypes at single cell resolution. Nat. Commun. 11, 1220 (2020).
Polioudakis, D. et al. A single-cell transcriptomic atlas of human neocortical development during mid-gestation. Neuron 103, 785–801.e8 (2019).
Cahoy, J. D. et al. A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain development and function. J. Neurosci. Res. 28, 264–278 (2008).
Hochgerner, H., Zeisel, A., Lönnerberg, P. & Linnarsson, S. Conserved properties of dentate gyrus neurogenesis across postnatal development revealed by single-cell RNA sequencing. Nat. Neurosci. 21, 290–299 (2018).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Stuart, T., Srivastava, A., Madad, S., Lareau, C. A. & Satija, R. Single-cell chromatin state analysis with Signac. Nat. Methods 18, 1333–1341 (2021).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
Karamitros, D. et al. Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells. Nat. Immunol. 19, 85–97 (2018).
Yáñez, A. et al. Granulocyte-monocyte progenitors and monocyte-dendritic cell progenitors independently produce functionally distinct monocytes. Immunity 47, 890–902.e4 (2017).
Villani, A.-C. et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science 356, eaah4573 (2017).
Grajkowska, L. T. et al. Isoform-specific expression and feedback regulation of E protein TCF4 control dendritic cell lineage specification. Immunity 46, 65–77 (2017).
Paul, F. et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell 163, 1663–1677 (2015).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinform. 10, 48 (2009).
Romano, O. et al. Transcriptional, epigenetic and retroviral signatures identify regulatory regions involved in hematopoietic lineage commitment. Sci. Rep. 6, 24724 (2016).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2019).
Koldamova, R. et al. Genome-wide approaches reveal EGR1-controlled regulatory networks associated with neurodegeneration. Neurobiol. Dis. 63, 107–114 (2014).
Tosic, J. et al. Eomes and Brachyury control pluripotency exit and germ-layer segregation by changing the chromatin state. Nat. Cell Biol. 21, 1518–1531 (2019).
Spiteri, E. et al. Identification of the transcriptional targets of FOXP2, a gene linked to speech and language, in developing human brain. Am. J. Hum. Genet. 81, 1144–1157 (2007).
Golonzhka, O. et al. Pbx regulates patterning of the cerebral cortex in progenitors and postmitotic neurons. Neuron 88, 1192–1207 (2015).
Acknowledgements
This work was supported by National Institutes of Health grants R01AI149669 to K.L.C. and J.D.W., R01HG010883 to J.D.W., F31AI155047 to M.C.V., training grants T32GM070449 to C.L. and T32GM007315 to M.C.V. and additional funding provided by the Rackham Regents Fellowship to M.C.V. The HSPC sample was provided by Cooperative Center of Excellence in Hematology grant DK106829. We thank J. Li, S.C.J. Parker, Y. Gu and members of the Collins lab for helpful discussions.
Author information
Authors and Affiliations
Contributions
M.C.V. and K.L.C. generated the 10x Multiome HSPC data. J.D.W conceived the idea of multi-omic extension of RNA velocity. C.L. and J.D.W. developed and implemented the method, performed data analyses and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Yuanhua Huang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Additional figures for mouse brain dataset.
a, Canonical marker gene expression for embryonic mouse brain cell types. b, Comparison of Cdh13 fits from scVelo and MultiVelo. An elevating transcription rate due to opening of chromatin produces a more linear fit and better captures the observed phase portrait. c, Scatterplot of gene likelihood against log total spliced count. Gene likelihood is not significantly affected by model assignment or trajectory type. Likelihood does increase with spliced count, as this usually indicates higher quality or highly variable genes. d, Switch times can be used to rank genes by the length of priming and decoupling intervals. Each range is scaled to 1 with outliers (n=1) removed. Top two rows: Histogram of priming intervals. Pbx3 and Celsr1 possess short and long priming phases, respectively. Bottom two rows: Histogram of decoupled intervals. While Rspo3 has a short decoupling phase with few cells within, Tgfbr1’s decoupling phase extends from RNA induction to RNA repression, and up to the end of the trajectory.
Extended Data Fig. 2 Additional figures for HSPC dataset.
a, Canonical marker gene expression for HSPCs. b, Cell cycle (S phase and G2M phase) scores and total unspliced ratio (U/(U+S)) plotted on UMAP coordinates. These factors were regressed out of the total RNA expression (but not the unspliced and spliced counts) during the preprocessing step as they do not appear to be cell-type or lineage specific. c, Box plots of histone modification levels from bulk ChIP-seq of FACS-purified HSCs (center line, median; box, Q1 and Q3; whiskers, 1.5x IQR; points, outliers). Each point in the box plot represents the sum of histone modification signal at chromatin accessibility peaks linked to a Model 1 or Model 2 gene. P-values are from a one-sided Wilcoxon rank-sum test. d, Velocity stream plot from MultiVelo analysis of Day 0 and Day 7 HSPC samples (Top). The majority of arrows go from Day 0 stem cells toward more differentiated Day 7 cells. UMAP coordinates colored by cell-type labels (Middle). UMAP coordinates colored by expression of CD133 (PROM1), an HSPC marker (Bottom).
Extended Data Fig. 3 Additional figures for human brain dataset.
a, Validation of the direction of MEF2C. Left: UMAP with cell types. Top: scVelo’s MEF2C fit produces inconsistency between gene time and global latent time. Bottom: MultiVelo’s results show consistent progression from nIPC to deeper layer (ExDp). b, DTW and UMAP results for EOMES and FOXP2 transcription factors. c, Additional motif DTW alignment results showing time lags between TF gene expression and corresponding motif accessibility. d, The accessibility of TF motifs binned across latent time. The latent time scale was split into 20 equal-sized bins, and the average motif accessibility of cells in each bin was computed and plotted. The motif sequence logos (downloaded from jaspar2020.genereg.net) are shown next to the TF names. e, Time-lag analysis of transcription factors and the expression of their validated downstream target genes. Top: UMAP plots colored by TF and target gene expression. Bottom: Line plots of TF and target gene expression, with correspondences from DTW alignment shown as dotted lines. Magenta: TFs. Cyan: target genes.
Extended Data Fig. 4 Chromatin dynamics, Model 0, the necessity of chromatin preprocessing, and pre-fitting illustrations.
a, Chromatin dynamics illustration: chromatin opening and closing are modeled as asymptotically approaching fully opened (1) or fully closed (0) starting from any initial value. b, Chromatin accessibility change as a function latent time inferred by scVelo using only the RNA portion of the 10X multiome mouse brain dataset (colored by mouse brain cell types). Black lines connect the mean accessibilities within 20 equal-sized windows. The shapes of the ATAC trends are qualitatively very similar to the ODE model we propose. c, Simulation of Model 0 samples. The long delay between chromatin closing and transcription initiation is unlikely to happen in real biological systems. In the rare cases when high chromatin accessibility but low expression or high expression but low accessibility pattern is observed, it is likely due to technical issues such as dropout or background noise. d, The need for normalization as a preprocessing step for ATAC-seq. e, The need for smoothing as a preprocessing step for ATAC-seq. f, Chromatin accessibility results after peak-to-gene aggregation, TF-IDF normalization, and WNN smoothing. It is the same as Fig. S2E. g, Illustration of bi-modal expression pattern for complete genes. Cells at the lower quantile can be far apart in low-dimensional embedded expression space. h, Simplified illustration of model predetermination reasoning. The highest chromatin accessibility region appears in different RNA phases in M1 and M2 genes. i, Illustration of the internal unspliced modality rescaling factor initialization.
Extended Data Fig. 5 Simulation study to assess parameter estimation and model determination.
A total of 1000 genes were simulated with various parameters for both model 1 and model 2. a, C-U view of noiseless simulations of 2000 time-points in the 0-20 hr range. b, U-S view of noiseless simulations from A. c, 3D view of noiseless simulations from A. d, Noise added to simulated points to mimic real data. e, f, Model 1 and Model 2 fits for the same simulated gene (S17). The likelihood is higher under Model 1, consistent with the ground truth. e, Left: 3D view of the fit Model 1 trajectory colored by states, along with predicted switch time points. Middle: simulation with ground-truth switch times. Right: U-S view of fitted trajectory colored by log(c). f, Similar to e, but the fit shown is for Model 2 (the incorrect model). g, h, Model fits for simulated gene S41, similar to e and f, but this time, Model 2 is the ground truth model. MultiVelo correctly identifies the sample to be Model 2 with accurate switch time estimations. The model assignments of 985/1000 samples were correctly predicted based on likelihood.
Supplementary information
Supplementary Information
Supplementary Notes, Figs. 1–6 and Table 1.
Source data
Source Data Fig. 2
Statistical Source Data.
Source Data Fig. 3
Statistical Source Data.
Source Data Fig. 4
Statistical Source Data.
Source Data Fig. 5
Statistical Source Data.
Source Data Fig. 6
Statistical Source Data.
Source Data Extended Data Fig. 1
Statistical Source Data.
Source Data Extended Data Fig. 2
Statistical Source Data.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, C., Virgilio, M.C., Collins, K.L. et al. Multi-omic single-cell velocity models epigenome–transcriptome interactions and improves cell fate prediction. Nat Biotechnol 41, 387–398 (2023). https://doi.org/10.1038/s41587-022-01476-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-022-01476-y
This article is cited by
-
TFvelo: gene regulation inspired RNA velocity estimation
Nature Communications (2024)
-
Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis
Nature Genetics (2024)
-
Prediction of single-cell RNA expression profiles in live cells by Raman microscopy with Raman2RNA
Nature Biotechnology (2024)
-
Studying temporal dynamics of single cells: expression, lineage and regulatory networks
Biophysical Reviews (2024)
-
Single-nucleus multi-omics of human stem cell-derived islets identifies deficiencies in lineage specification
Nature Cell Biology (2023)