Annotate Data

Creating Annotations in CELLxGENE Annotate

CZ CELLxGENE Annotate enables users to create and edit categorical annotations within the app. This is commonly used to annotate cell types of data being analyzed for the first time.


Data Lifecycle For Annotations

1. Creating Annotations (Quick Start)

To get started, run:

cellxgene launch mydata.h5ad

To preserve data provenance, cellxgene does not alter the input h5ad file. Rather, newly-created annotations are saved in a specified CSV file:


The default annotations-directory is your current working directory (i.e., the directory you were in when you started cellxgene) You will be prompted to enter a name for your annotations the first time you create a new category We also assign a unique identifier in the form of an 8-character suffix, ########; this helps cellxgene identify your file to avoid overwriting your work

2. Loading, editing and updating existing draft annotations

CELLxGENE Annotate allows you to load and edit compatible draft annotations across multiple sessions.

Compatible annotations are tabular, with category names as column headers; anndata.obs.index as the index; and categorical values (i.e., fewer unique values per column than specified in --max-category-items, default 1000).

There are two options for updating draft annotations.

Autodetect annotations CSV

CELLxGENE Annotate will automatically find and reload your draft annotations in editable mode. This assumes that:

  1. The h5ad filename is the same
  2. You launch cellxgene from the annotations-directory (i.e., the directory that contains your CSV)
  3. You use the same browser and have not cleared your cookies (we use a small cookie to keep track of which user created the file to avoid accidental overwrites; see FAQ)

Specify an annotations CSV

This mode is only appropriate for single-user, local CELLxGENE Annotate instances.

If you'd like to specify the complete file path for your annotations, you can do so by running:

cellxgene launch mydata.h5ad --annotations-file path/to/myfile.csv

Any changes you make will be reflected in the original CSV. If the file does not exist, it will be created. Please note that this file will be overwritten, making this mode inappropriate for hosted / multi-user settings (see below).

3. Merging draft annotations with the main h5ad file

Once you're finished with your annotations, you should finalize and preserve your work by merging your csv into your main h5ad file like this:

import pandas as pd
import scanpy as sc

new_annotations = pd.read_csv('myannotations.csv',
anndata ='mydata.h5ad')
anndata.obs = anndata.obs.join(new_annotations)

Annotations by multiple users

As described in the hosted tutorials, we do not officially support hosted or multi-user use of cellxgene. However, we recognize that the app is often adapted for this purpose, and have tried to provide a "safe path" for multi-user setups that avoids overwriting data.

Specifying a single file name for multiple contributors will result in data overwriting. To avoid this, you can instead specify an output directory and allow CELLxGENE Annotate to assign filenames.

To specify an output directory, run:

cellxgene launch mydata.h5ad --annotations-dir path/to/annotations-directory/

For each user, annotations will be saved as follows: Each user will be prompted to enter a name for their annotations the first time they create a new category We also assign a unique identifier in the form of an 8-character suffix, ########; this helps CELLxGENE Annotate identify their specific file to avoid overwriting others' work Any annotations created in the application will be autosaved in annotations-directory/name-########.csv


How do I know my annotations are saved?

cellxgene autosaves any changes made to your annotations every 3 seconds.

I think I deleted my annotations! Can they be recovered?

Not to worry! We save the last 10 versions of your annotations in annotations-directory/NAME-backups/

What about creating continuous annotations?

Continuous metadata is important! However, these values (e.g., pseudotime) are the result of statistical analyses that are beyond CELLxGENE Annotate's visualization and exploration focused scope. We do, of course, provide visualization of continuous metadata values computed elsewhere and stored in anndata.obs.

I keep getting weird index errors when trying to join my annotations to my anndata?

This is most likely because the h5ad file you are working with is not the original file used to generate the annotations! We recommend merging new annotations in on a regular basis for this reason.

How do you remember my unique ID to match my CELLxGENE Annotate session with my annotations file?

We place a small cookie (file) in your browser that identifies where your draft annotations are saved. This file never leaves your machine, and is never sent to the CELLxGENE team or anyone else.

I have feedback and ideas for you!

We would love to hear your feedback :) Contact us at: