Census Models

This page features models and integrated embeddings of the Census data corpus, organized by CELL×GENE’s level of involvement with their maintenance and availability. These models are breaking new ground and will continue to improve. We encourage you to try out these models and provide feedback!

Please see these tutorials for usage details.

If you’d like to have your project featured here, please get in touch.

Community Projects

The community has also developed many wonderful projects using Census data. While CELL×GENE does not directly host or maintain these projects, we’re excited to showcase them here.
Please contact their creators with questions or feedback.

PINNACLE: Contextual AI Model for Single-Cell Protein Biology

Marinka Zitnik, Michelle M. Li & Yepeng Huang at Zitnik Lab, Department of Biomedical Informatics, Harvard University

PINNACLE is a flexible geometric deep learning model that is trained on protein interaction networks contextualized by 156 cell types from a human single-cell transcriptomic atlas in order to generate context-aware protein representations. It generates 394,760 protein representations (~18X more than existing methods) that reflect the organization of 156 cell types spanning 62 tissues. We demonstrate that PINNACLE’s contextualized protein representations 1) enable zero-shot retrieval of the tissue hierarchy, 2) improve 3D structure predictions of PD-1/PD-L1 and B7-1/CTLA-4 protein interactions, two critical immune checkpoint interactors targeted by cancer immunotherapies, and 3) outperform state-of-the-art, yet context-free, models in nominating therapeutic targets of RA and IBD while pinpointing their most predictive cell type contexts.
Marinka Zitnik
Last Updated
December 13, 2023

scCIPHER: Contextual Deep Learning on Single-Cell Knowledge Graphs for Precision Medicine in Neurological Disorders

Marinka Zitnik & Ayush Noori at Zitnik Lab, Department of Biomedical Informatics, Harvard University

Neurological disorders are the leading driver of global disability and cause 16.8% of global mortality. Unfortunately, most lack disease-modifying treatments or cures. To address disease complexity and heterogeneity in neurological disease, we developed scCIPHER, an AI approach for Contextually Informed Precision HEalthcaRe using deep learning on single-cell knowledge graphs. We created the Neurological Disease Knowledge Graph (NeuroKG), a neurobiological knowledge graph with 132 thousand nodes and 3.98 million edges, by integrating 20 high-quality primary data sources with single-cell RNA-sequencing data from 3.37 million cells across 106 regions of the adult human brain. Next, we pre-trained a heterogeneous graph transformer on NeuroKG to create scCIPHER. We leverage scCIPHER to make precision medicine-based predictions in neurological disorders across patient phenotyping, therapeutic response prediction, and causal gene discovery tasks, with validation in large-scale patient cohorts.
Last Updated
December 13, 2023

Identifying regulatory network changes between healthy and cancerous tissues with single-cell atlas

Matt Thomson, Jialong Jiang & Yingying Gong at Thomson Lab, Caltech

A major question in cancer biology is how cancer cells rewire the regulatory network to support uncontrollable cell growth. We developed a computational framework, D-SPIN, that decodes regulatory network models of single-cell profiling from multiple samples and various conditions. With a collection of healthy and cancerous tissue profiling from Census data, we identified a hundred gene programs shared between healthy and cancer tissues, constructed regulatory network models of the cell population, and observed a series of regulatory shifts in key functions such as energy metabolism and cell adhesion in cancer tissues. Our study can help identify network-level signatures of cancer progression, and propose critical regulatory shifts as targets of therapeutics.
Last Updated
December 12, 2023