CELLxGENE logo
CollectionsDatasetsscExpression
Beta
Help & Documentation
  • CellxGene
  • Find Published Data
  • Contribute and Publish Data
  • Download Published Data
  • Analyze Public Data
    • Get Started
    • Hosted Tutorials
    • scExpression Documentation
      • Get Started
      • Cell Type and Gene Ordering
      • Gene Expression Data Processing
    • Algorithms and Ontologies
  • Annotate and Analyze Your Data
    • Get Started
    • Getting Started: Install, Launch, Quick Start
    • Self Host cellxgene
    • Preparing Data
    • Annotating Data
    • Gene Sets
    • Community Extensions
  • Join the CellxGene User Community
  • Cite cellxgene in your publications
  • Frequently Asked Questions
  • Learn About Single Cell Data Analysis

Algorithms and Ontologies

Differential Expression

Differential expression calculated by CELLxGENE Annotate and CELLxGENE Explorer uses a Welch's t-test, which assumes that the two populations are normally distributed, but may have unequal variance. We use a two-sided t-test against the null hypothesis that the two populations have equal means. P-values are adjusted with the Bonferroni correction.

To help avoid spurious results, we only return genes with log fold change greater than 0.01:

|log2( mean(set1) / mean(set2) )| > 0.01

We then sort genes by their associated |t value| and return the top 15 genes.

Cell type ontology ordering

Cell types in a dot plot can be ordered using lineage information to maximize the utility of scExpression for pattern discovery and readability. The cell ontology (CL) directed acyclic graph (DAG) provides a good approximation to biological lineages. Thus we have a developed a method to create an ordered list of cell types based on the information contained in the CL DAG.

The method traverses the DAG starting from the root until reaching the leftmost bottom node adding nodes present in the data along the way. Once it reaches the bottom, it recursively repeats the process for any siblings. After, it traverses back and repeats the sibling process until exhaustion.

Algorithm

  1. Load the CL ontology, using either the cl.obo or cl.owl from the source file here.
  2. Subset the ontology to only contain the cell types of interest along with their ancestors.
  3. Create a 2-dimensional representation of the DAG using the dot method of graphiz. See here for a full description of this method.
  4. Traverse the tree starting at the root:
    1. If node is present in cell types of interest, add to the final ordered list.
    2. Remove the node from nodes of interest.
    3. Get children nodes
    4. Perform recursion for each child node, going left-to-right through the children nodes.
  5. Return the ordered list

Example

The following is an example of how a subset of the CL DAG looks like when rendering it with the dot algorithm from graphiz. Orange nodes represent cell types that would be present in the data corpus and are the target labels to be ordered.

The cell type ordering algorithm described above would return the following ordered list of cell types.

  • Native Cell
  • Secretory Cell
  • Club Cell
  • Chondrocyte
  • Goblet Cell
  • Mast Cell
  • T Cell
  • CD4 positive, Alpha beta T Cell
  • CD8 positive, Alpha beta T Cell
  • B Cell
  • Natural Killer
  • Dendritic Cell
  • Monocyte
  • Macrophage
  • Alveolar Macrophage
  • Basal Cell
  • Fibroblast
  • Megakaryocyte
  • Enucleate Erythrocyte
  • Myofibroblast Cell
  • Mesothelial Cell
  • Endothelial Cell
  • Blood Vessel Endothelial Cell
  • Smooth Muscle Cell
  • Vascular Associated Smooth Muscle Cell
  • Epithelial Cell
  • Squamous Epithelial Cell
  • Ciliated Cell