CELLxGENE logo
CollectionsDatasetsscExpression
Beta
Help & Documentation
  • CellxGene
  • Find Published Data
  • Contribute and Publish Data
  • Download Published Data
  • Analyze Public Data
    • Get Started
    • Hosted Tutorials
    • scExpression Documentation
      • Get Started
      • Cell Type and Gene Ordering
      • Gene Expression Data Processing
    • Algorithms and Ontologies
  • Annotate and Analyze Your Data
    • Get Started
    • Getting Started: Install, Launch, Quick Start
    • Self Host cellxgene
    • Preparing Data
    • Annotating Data
    • Gene Sets
    • Community Extensions
  • Join the CellxGene User Community
  • Cite cellxgene in your publications
  • Frequently Asked Questions
  • Learn About Single Cell Data Analysis

Contributing Data

CELLxGENE supports a rapidly growing single cell data corpus because of generous contributions from researchers like you! If you have single cell data that meet our requirements, please reach out to our curation team at cellxgene@chanzuckerberg.com.

Publishing Process

The process for submission to the portal is:

  • You confirm we can support your submission by reaching out to our curation team at cellxgene@chanzuckerberg.com with a description of the data that you'd like to contribute.
  • We confirm that we will accept your data.
  • You prepare your data according to our submission requirements and send us your files.
  • We upload to a private collection where you can review.
  • You prepare revised data and send us your revised files, as needed.
  • We publish the data when you tell us to.

Dataset Requirements

Data Eligibility

CELLxGENE is focused on supporting the global community attempting to create references of human cells and tissues. As a result, there are a few kinds of datasets that we are not likely to accept at this time:

  • drug screens
  • cell lines
  • organisms other than mouse or human

Formatting Requirements

We need the following collection metadata (i.e. details associated with your publication or study)

  • Collection information:
    • Title
    • Description
    • Contact: name and email
    • Publication/preprint DOI: can be added later
    • URLs: any additional URLs for related data or resources, such as GEO or protocols.io - can be added later

Datasets need the following information added to a single h5ad (AnnData 0.7) format file per CELLxGENE visualization:

  • Dataset-level metadata in uns:
    • schema_version: 2.0.0
    • title: title of the individual dataset visualization
    • X_normalization: method used to normalize the data stored in X and any scaling or logging; 'none' if data in X is raw
    • optional: batch_condition: list of obs fields that define that “batches” that a normalization or integration algorithm should be aware of
  • Data in .X and raw.X:
    • raw counts are required
    • normalized counts are strongly recommended
    • raw counts should be in raw.X if normalized counts are in .X
    • if there is no normalized matrix, raw counts should be in .X
  • Cell metadata in obs (the values MUST be the most specific specific term available from the specified ontology):
    • organism_ontology_term_id: NCBITaxon (NCBITaxon:9606 for human, NCBITaxon:10090 for mouse)
    • tissue_ontology_term_id: UBERON
    • assay_ontology_term_id: EFO
    • disease_ontology_term_id: MONDO or PATO:0000461 for 'normal'
    • cell_type_ontology_term_id: CL
    • ethnicity_ontology_term_id: HANCESTRO if human, unknown if information unavailable, na if non-human
    • development_stage_ontology_term_id: HsapDv if human, MmusDv if mouse, unknown if information unavailable
    • sex_ontology_term_id: PATO:0000384 for male, PATO:0000383 for female, or unknown if unavailable
  • Embeddings in obsm:
    • One or more two-dimensional embeddings, prefixed with 'X_'
  • Features in var & raw.var (if present):
    • index is Ensembl ID
    • preference is that gene have not been filtered in order to maximize future data integration efforts

Data Submission Policy

I give CZI permission to display, distribute, and create derivative works (e.g. visualizations) of this data for purposes of offering CELLxGENE Discover, and I have the authority to give this permission. It is my responsibility to ensure that this data is not identifiable. In particular, I commit that I will remove any direct personal identifiers in the metadata portions of the data, and that CZI may further contact me if it believes more work is needed to de-identify it. If I choose to publish this data publicly on CELLxGENE Discover, I understand that (1) anyone will be able to access it subject to a CC-BY license, meaning they can download, share, and use the data without restriction beyond providing attribution to the original data contributor(s) and (2) the Collection details (including collection name, description, my name, and the contact information for the datasets in this Collection) will be made public on CELLxGENE Discover as well. I understand that I have the ability to delete the data that I have published from CELLxGENE Discover if I later choose to. This however will not undo any prior downloads or shares of such data.