CELLxGENE logoCollectionsDatasetsGene Expression
  • CellxGene
  • Find Published Data
  • Contribute and Publish Data
  • Download Published Data
  • Analyze Public Data
    • Get Started
    • Hosted Tutorials
    • Gene Expression Documentation
      • Get Started
      • Cell Type and Gene Ordering
      • Gene Expression Data Processing
      • Available Tissues
      • Find Marker Genes
  • Annotate and Analyze Your Data
    • Get Started
    • Getting Started: Install, Launch, Quick Start
    • Self Host cellxgene
    • Preparing Data
    • Annotating Data
    • Automatic Annotation
    • Gene Sets
    • Community Extensions
  • Join the CellxGene User Community
  • Cite cellxgene in your publications
  • Frequently Asked Questions
  • Learn About Single Cell Data Analysis

Frequently Asked Questions:

We’ve compiled a list of the questions we get most frequently. Check them out below:

CZ CELLxGENE Discover

How can I find a dataset of interest?

Refer to this tutorial to find out how the dataportal is organized and how you can search by dataset metadata.

How can I request a particular dataset to be on the portal?

You can request a particular dataset by filing a github issue on the CELLxGENE single-cell-curation repo.

CZ CELLxGENE Explorer

Is there a way to shift the color scales for continuous variables?

You can use the clipping feature to remove outliers (based on percentiles) and shift the color scale (see this tutorial for a demonstration of the feature on slide 8)

If I have multiple levels of cell type annotations, how can I visualize cell type hierarchy within CELLxGENE?

While we don't offer this capability, you can make use of the subsetting feature to reduce the cells in view to just your major subtype of interest. See

Gene Expression

How is Gene Expression Integration Performed?

Please see the Gene Expression documentation.

CZ CELLxGENE Annotate

I tried to pip install cellxgene and got a weird error I don't understand

This may happen, especially as we work out bugs in our installation process! Please create a new Github issue, explain what you did, and include all the error messages you saw. It'd also be super helpful if you call pip freeze and include the full output alongside your issue.

What are the requirements for an anndata object to be consumed by CELLxGENE Annotate?

Take a look at the data format requirements

How can I remove categorical metadata from my dataset that I do not wish to visualize?

All metadata is read from adata.obs. CELLxGENE Annotate detects columns in this table and displays them in the UI. To remove these categories from the interface, you simply need remove them form the obs dataframe.

Once loaded in and viewing the UMAP, one of my categories is failing to color the UMAP. In the drop down menu, it is showing that it assigns the colors to each observation, but over the UMAP it says "Failure loading umap". What is the problem?

It may be that you have invalid values in your categorical metadata field of interest. Check for values such as NULL or NA and recast them as a string with the appropriate value (i.e. 'NA').

I have a BIG dataset, how can I make CELLxGENE Annotate run as fast as possible?

If your dataset requires gigabytes of disk space, you may need to select an appropriate storage format in order to effectively utilize cellxgene. Tips and tricks:

  • cellxgene is optimized for columnar data access. For large datasets, format the expression matrix (.X) as either a SciPy CSC sparse matrix or a dense Numpy array (whichever creates a smaller h5ad file). If you are using cellxgene prepare, include the --sparse flag to ensure .X is formatted as a CSC sparse matrix (by default, .X will be a dense matrix).
  • By default, cellxgene loads the dataset into memory, and start time is directly proportional to h5ad file size and the speed of your file system. Expect that large (e.g., million cell) datasets will take minutes to load, even on relatively fast computers with a high performance local hard drive. Once loaded, exploring metadata should still be quick. If this start time is a problem, try the --backed flag, which will attempt to lazily load data as needed (caveat: subsequent data access may be slower).
  • If your dataset size exceeds the size of memory (RAM) on the host computer, differential expression calculations will be extremely slow (or fail, if you run out of virtual memory). In this case, we recommend running with the --disable-diffexp flag. For datasets that are extremely large, you may also find the --backed flag improves your ability to explore them.