CELLxGENE logoApplicationCollectionsDatasetsGene ExpressionCell Guide
Beta

CensusAPIModels

Census Models

This page features models and integrated embeddings of the Census data corpus, organized by CELL×GENE’s level of involvement with their maintenance and availability. These models are breaking new ground and will continue to improve. We encourage you to try out these models and provide feedback!

Please see these tutorials for usage details.

If you’d like to have your project featured here, please get in touch.

CELL×GENE Collaboration Projects

These models and their output embeddings are ongoing collaborations. CZI and the partner labs are improving the models as the Census resource grows. Embeddings are accessible via the Census API; corresponding models are available for download.
Please contact the CELL×GENE team with feedback.

scVI integrated-embeddings with explicit modeling of batch effects

CELLxGENE Discover Team at CZI · Nir Yosef at Weizmann Institute of Science, Israel · Can Ergen & Martin Kim at UC Berkeley

scVI uses autoencoding-variational Bayes optimization to learn the underlying latent state of gene expression and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. These cell embeddings are derived from an scVI model trained on all primary Census cells while accounting for the batch effects of sequencing assay, dataset, donor, and suspension type (cell vs nucleus). Then embeddings were obtained as the latent space for all Census cells after performing a forward pass through the trained model. These embeddings are made in collaboration with the scVI team from Nir Yosef’s laboratory. For questions about scVI please refer to the scverse discourse forum https://discourse.scverse.org/.
contact
CELLxGENE Discover Team
Last Updated
November 18, 2023
Census Version
2023-12-15
measurement
RNA
embedding
obs
columns
200
Organism
homo_sapiens
Cells
63M
Organism
mus_musculus
Cells
5.7M

Geneformer embeddings fine-tuned for CELLxGENE Census cell subclass classification

CELLxGENE Discover Team at CZI

Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in settings with limited data in network biology. These cell embeddings are derived from a Geneformer model CZI fine-tuned for cell subclass classification. As the fine-tuning procedure remains experimental and wasn’t performed by the Geneformer authors, these embeddings should not be used to assess performance of the pre-trained Geneformer model.
Last Updated
November 6, 2023
Census Version
2023-12-15
organism
homo_sapiens
measurement
RNA
embedding
obs
cells
63M
columns
512

Community Projects

The community has also developed many wonderful projects using Census data. While CELL×GENE does not directly host or maintain these projects, we’re excited to showcase them here.
Please contact their creators with questions or feedback.

PINNACLE: Contextual AI Model for Single-Cell Protein Biology

Marinka Zitnik, Michelle M. Li & Yepeng Huang at Zitnik Lab, Department of Biomedical Informatics, Harvard University

PINNACLE is a flexible geometric deep learning model that is trained on protein interaction networks contextualized by 156 cell types from a human single-cell transcriptomic atlas in order to generate context-aware protein representations. It generates 394,760 protein representations (~18X more than existing methods) that reflect the organization of 156 cell types spanning 62 tissues. We demonstrate that PINNACLE’s contextualized protein representations 1) enable zero-shot retrieval of the tissue hierarchy, 2) improve 3D structure predictions of PD-1/PD-L1 and B7-1/CTLA-4 protein interactions, two critical immune checkpoint interactors targeted by cancer immunotherapies, and 3) outperform state-of-the-art, yet context-free, models in nominating therapeutic targets of RA and IBD while pinpointing their most predictive cell type contexts.
Last Updated
December 13, 2023

scCIPHER: Contextual Deep Learning on Single-Cell Knowledge Graphs for Precision Medicine in Neurological Disorders

Marinka Zitnik & Ayush Noori at Zitnik Lab, Department of Biomedical Informatics, Harvard University

Neurological disorders are the leading driver of global disability and cause 16.8% of global mortality. Unfortunately, most lack disease-modifying treatments or cures. To address disease complexity and heterogeneity in neurological disease, we developed scCIPHER, an AI approach for Contextually Informed Precision HEalthcaRe using deep learning on single-cell knowledge graphs. We created the Neurological Disease Knowledge Graph (NeuroKG), a neurobiological knowledge graph with 132 thousand nodes and 3.98 million edges, by integrating 20 high-quality primary data sources with single-cell RNA-sequencing data from 3.37 million cells across 106 regions of the adult human brain. Next, we pre-trained a heterogeneous graph transformer on NeuroKG to create scCIPHER. We leverage scCIPHER to make precision medicine-based predictions in neurological disorders across patient phenotyping, therapeutic response prediction, and causal gene discovery tasks, with validation in large-scale patient cohorts.
Last Updated
December 13, 2023

Identifying regulatory network changes between healthy and cancerous tissues with single-cell atlas

Matt Thomson, Jialong Jiang & Yingying Gong at Thomson Lab, Caltech

A major question in cancer biology is how cancer cells rewire the regulatory network to support uncontrollable cell growth. We developed a computational framework, D-SPIN, that decodes regulatory network models of single-cell profiling from multiple samples and various conditions. With a collection of healthy and cancerous tissue profiling from Census data, we identified a hundred gene programs shared between healthy and cancer tissues, constructed regulatory network models of the cell population, and observed a series of regulatory shifts in key functions such as energy metabolism and cell adhesion in cancer tissues. Our study can help identify network-level signatures of cancer progression, and propose critical regulatory shifts as targets of therapeutics.
Last Updated
December 12, 2023