Self-Host CELLxGENE

Hosting CZ CELLxGENE Annotate on the Web

CZ CELLxGENE Annotate is intended to be used by researchers on their local machines. In cases where collaborators are not comfortable installing Annotate, it is possible to host Annotate, so your collaborators only need to open the URL you send them.

We don't officially support web deployment, but we've offered some guidance in the following sections on ways to deploy CELLxGENE Annotate to the web.

General Notes and Cautions

Please consider the following when deploying CELLxGENE Annotate in any "hosted" environment, especially where access from the broader Internet is possible:

  • Information security requires careful configuration of the host environment, including firewall, logging, etc - please follow best practices
  • Annotate includes features which may be inappropriate for a hosted deployment - you may wish to use the following command line option:--disable-diffexp
  • cellxgene launch currently uses Flask's development server, which is not recommended for hosted deployment (see the Flask documentation)
  • We have no testing or official support for deployments where multiple users are accessing the same Annotate instance
  • Your Annotate instance is likely to hang or crash if too many people access it at the same time, especially if they using functions that call the Python backend (such as differential expression, noted above)
  • Annotate only supports one instance per dataset

If you believe you have found a security-related issue with CELLxGENE Annotate, please report the issue immediately to security@chanzuckerberg.com.

Configuration Options

The following configuration options require special consideration in any multi-user or hosted environment:

--disable-diffexp: the differential expression computation can be resource intensive, in particular for large datasets. If many differential expression calculation requests are made in rapid sequence, it may cause the server CPU or memory resources to be exhausted, and impact the ability of other users to access data. This command line option will disable the differential expression feature, including the removal of the Differential expression button.

--disable-annotations: annotations, which is enabled by default, may not be appropriate for hosted environments. It will write to the local file system, and in extreme cases could be used to abuse (or exceed) file system capacity on the hosting server. We recommend disabling this with this flag.

--annotations-file: this specifies a single file for all end-user annotations, and is incompatible with hosted or multi-user use of Annotate. Using it will cause loss of user annotation data (ie, the CSV file will be overwritten). If you wish to explore using the annotations feature in a multi-user environment, please refer to the annotations documentation, and in particular the --annotations-dir flag.

Community Software Projects

There are a number of teams building tools or infrastructure to better utilize Annotate in a multiple user environment. While we do not endorse any particular solution, you may find the following helpful.

If you know of other solutions, drop us a note and we'll add to this list.

Community Self-Hosting Approaches

CELLxGENE Annotate Gateway/Apache2 Reverse Proxy with Authentication

Contributors

Fabian Rost and Alexandre Mestiashvili

Hosting Use Case

  • Private self-hosting of multiple datasets for multiple groups
  • Groups only have access to datasets that they "own"
  • Password Authentication

General Description

Start multiple cellxgene-gateway instances for each of the different user groups and use a reverse proxy for authentication and forwarding to the different CELLxGENE Annotate instances.

Components

CELLxGENE Gateway

  • Application to be run

Apache Reverse Proxy

  • Used as a reverse proxy to redirect web requests and provide authentication

Tutorial and Code

AWS/Linux Server with Basic Authentication

Contributors

Lisa Sikkema and Thomas Walzthoeni

Hosting Use Case

  • Private self-hosting on an EC2 instance
  • Access by external collaborators and potentially password authentication

Components:

  • CELLxGENE Annotate: the application to be run
  • Nginx: used as a reverse proxy to redirect web requests to the WSGI
  • uWSGI: forwards requests from the Nginx web server to our python flask framework
  • Htpasswd: used to allow basic authentication for web requests

References:

For a general reference, you can refer to this article on how to deploy a flask application (a.k.a. CELLxGENE Annotate) to an AWS EC2 instance - or a general linux server using Nginx for receiving web requests, uWSGI to redirect those requests to the python application and htpasswd to add a basic authentication layer over the server.

This general guide can be modified with a few components that would be specific to a CELLxGENE Annotate deployment. First, the flask application which is deployed on your server (found in the ‘prep your project’ section of the linked article) should be cellxgene with an appropriate wsgi.py file added as described in the article. Second, the Nginx config file needs to be modified, an example config file may look like the following:

# Default route to login
server {
    listen 443 ssl;
    server_name YOURHOSTNAME;

    # SSL
    ssl_certificate /etc/nginx/conf.d/domain.crt;
    ssl_certificate_key /etc/nginx/conf.d/domain.key;

    location /cellxgene_example {
        proxy_set_header Accept-Encoding "";
        proxy_pass
http://localhost:5007/
;
        proxy_set_header Host                 $http_host;
        proxy_set_header X-Real-IP            $remote_addr;
        proxy_set_header X-Forwarded-For      $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto    $scheme;
        # replacements
        sub_filter 'href="static' 'href="cellxgene_example/static';
        sub_filter 'src="static' 'src="cellxgene_example/static';
        sub_filter_types *;
        sub_filter_once off;
        # To add basic authentication to v2 use auth_basic setting.
        auth_basic "Registry realm";
        auth_basic_user_file /etc/nginx/conf.d/nginx.htpasswd;
        client_max_body_size 50M;
    }

}

server {
    listen 80 default_server;

    server_name _;

    return 301
https://$host$request_uri
;
}

With respect to above Nginx config specification, one would start the browser using port 5007 and forward a request from https://YOURHOSTNAME/cellxgene_example to the cxg browser (i.e. http://localhost:5007).

Note regarding storage and memory on the EC2 instance (if you implement using AWS):

Since datasets will be loaded into the memory of your machine, ram costs could potentially be high, especially if you plan to host multiple large datasets. To avoid higher than necessary costs, you can store data on S3 and only pull the data onto the machine’s memory when the CELLxGENE Explorer is launched. This should be compatible with the approach detailed above (albeit by tweaking a config variable to point to where the data is located).