Help & Documentation
  • CellxGene
  • Find Published Data
  • Contribute and Publish Data
  • Download Published Data
  • Analyze Public Data
    • Get Started
    • Hosted Tutorials
    • scExpression Documentation
      • Get Started
      • Cell Type and Gene Ordering
      • Gene Expression Data Processing
    • Algorithms and Ontologies
  • Annotate and Analyze Your Data
    • Get Started
    • Getting Started: Install, Launch, Quick Start
    • Self Host cellxgene
    • Preparing Data
    • Annotating Data
    • Gene Sets
    • Community Extensions
  • Join the CellxGene User Community
  • Cite cellxgene in your publications
  • Frequently Asked Questions
  • Learn About Single Cell Data Analysis

scExpression — Query Gene Expression Across Tissues

scExpression is a tool that allows users to query the expression of any gene across all data in CELLxGENE Discover. A query results in a dot plot per tissue as explained below.

How to Interpret a Gene Expression Dot Plot

Dot Plot Basics

A dot plot can reveal gross differences in expression patterns across cell types and highlights genes that are moderately or highly expressed in certain cell types.

Dot plots visualize values across two dimensions: color and size (Figure 1). The color of the dot approximates average gene expression. Its size represents the percentage of cells within each cell type that expresses the gene.

Figure 1. Two metrics are represented in gene expression dot plots, gene expression and percentage of expressing cells.

The combination of these metrics in a grid of genes by cell types allows to make qualitative assessments of gene expression (Figure 2).

Genes that are lowly expressed or expressed in a small percentage of cells are difficult to visually identify in a dot plot. This is particularly important for certain marker genes that are specifically but lowly expressed in their target cell types, for example transcription factors and cell-surface receptors.

Figure 2. Types of possible qualitatively assessments in a dot plot.

How to Make Sense of Normalized Values

The data used to create the averages for the dot plot is quantile normalized and it ranges from 0 to 6 (see "Gene Expression Data Processing" section for details). Roughly, low expression has normalized values lower than 2, medium expression ranges from 2 to 4, and high expression is higher than 4 (Figure 3). These values are used for the dot plot color scheme and are constant and comparable across different dot plots. Additionally, the user has the ability to switch to a relative scale that maps the lowest and highest expression values in a dot plot to the min and max colors, thus providing a wider color range for what's shown in a dot plot.

Figure 3. Examples of high, medium and low expression.

The examples in Figure 3 have a relatively constant percentage of cells expressing a gene (dot size), however to identify highly expressed genes the user is advised to pay attention to both the color intensity and the size of the dot.

How to Navigate Cell Types

Cell types in the dot plot (rows) are ordered by default with an algorithm that preserves relationships in the Cell Type ontology (CL). Child cell types are represented by indentation with a maximum depth of 2, any cell types with higher depth than that get truncated to 2.

All cell types from a given tissue are shown as originally annotated in the dataset-of-origin. This leads to scenarios where parent and children cell types may be present in the dot plot, but they are independent observations and the former is not a superset of the latter (Figure 4).

Figure 4. In some cases parent and children cell types are present, but these are independent of each other.

Caveats of Normalization

Given that data are quantile normalized all expression is relative to the cells it is measured in. As such comparisons of absolute expression across cell types could be made if the number of genes measured is equal across all cells. While this assumption is violated, we attempt to minimize negative effects by excluding cells with low gene coverage thus reducing the variance in the number of genes measured across cells.

Nonetheless, caution is advised when finding subtle differences in the dot plot across cell types.

Users interested in evaluating the pre-normalized absolute expression data can access it through our cell api (coming soon).