- Find Published Data
- Contribute and Publish Data
- Download Published Data
- Analyze Public Data
- Get Started
- Hosted Tutorials
- Gene Expression Documentation
- Get Started
- Cell Type and Gene Ordering
- Gene Expression Data Processing
- Available Tissues
- Find Marker Genes
- Gene Expression Source Data
- Annotate and Analyze Your Data
- Get Started
- Getting Started: Install, Launch, Quick Start
- Self Host cellxgene
- Preparing Data
- Annotating Data
- Automatic Annotation
- Gene Sets
- Community Extensions
- Multimodal Annotations
- Join the CellxGene User Community
- Cite cellxgene in your publications
- Frequently Asked Questions
- Learn About Single Cell Data Analysis
Frequently Asked Questions:
We’ve compiled a list of the questions we get most frequently. Check them out below:
CZ CELLxGENE Discover
How can I find a dataset of interest?
Refer to this tutorial to find out how the dataportal is organized and how you can search by dataset metadata.
How can I request a particular dataset to be on the portal?
You can request a particular dataset by filing a github issue on the CELLxGENE single-cell-curation repo.
CZ CELLxGENE Explorer
Is there a way to shift the color scales for continuous variables?
You can use the clipping feature to remove outliers (based on percentiles) and shift the color scale (see this tutorial for a demonstration of the feature on slide 8)
If I have multiple levels of cell type annotations, how can I visualize cell type hierarchy within CELLxGENE?
While we don't offer this capability, you can make use of the subsetting feature to reduce the cells in view to just your major subtype of interest. See
How is Gene Expression Integration Performed?
Please see the Gene Expression documentation.
CZ CELLxGENE Annotate
I tried to
pip install cellxgene and got a weird error I don't understand
This may happen, especially as we work out bugs in our installation process! Please create a new Github issue, explain what you did, and include all the error messages you saw. It'd also be super helpful if you call
pip freeze and include the full output alongside your issue.
What are the requirements for an anndata object to be consumed by CELLxGENE Annotate?
Take a look at the data format requirements
How can I remove categorical metadata from my dataset that I do not wish to visualize?
All metadata is read from
adata.obs. CELLxGENE Annotate detects columns in this table and displays them in the UI. To remove these categories from the interface, you simply need remove them form the
Once loaded in and viewing the UMAP, one of my categories is failing to color the UMAP. In the drop down menu, it is showing that it assigns the colors to each observation, but over the UMAP it says "Failure loading umap". What is the problem?
It may be that you have invalid values in your categorical metadata field of interest. Check for values such as
NA and recast them as a string with the appropriate value (i.e.
I have a BIG dataset, how can I make CELLxGENE Annotate run as fast as possible?
If your dataset requires gigabytes of disk space, you may need to select an appropriate storage format in order to effectively utilize
cellxgene. Tips and tricks:
cellxgeneis optimized for columnar data access. For large datasets, format the expression matrix (
.X) as either a SciPy CSC sparse matrix or a dense Numpy array (whichever creates a smaller
h5adfile). If you are using
cellxgene prepare, include the
--sparseflag to ensure
.Xis formatted as a CSC sparse matrix (by default,
.Xwill be a dense matrix).
- By default,
cellxgeneloads the dataset into memory, and start time is directly proportional to
h5adfile size and the speed of your file system. Expect that large (e.g., million cell) datasets will take minutes to load, even on relatively fast computers with a high performance local hard drive. Once loaded, exploring metadata should still be quick. If this start time is a problem, try the
--backedflag, which will attempt to lazily load data as needed (caveat: subsequent data access may be slower).
- If your dataset size exceeds the size of memory (RAM) on the host computer, differential expression calculations will be extremely slow (or fail, if you run out of virtual memory). In this case, we recommend running with the
--disable-diffexpflag. For datasets that are extremely large, you may also find the
--backedflag improves your ability to explore them.