RXRX19a

NeurIPS 2019 competition
coming soon:

CellSignal: Disentangling biological signal from experimental noise in cellular images.

Download the Dataset

Recursion released a preprint on applying deep-learning-driven analysis of cellular morphology to develop a scalable “phenomics” platform. The preprint demonstrates the capabilities of Recursion’s platform to model complex immune biology and screen for new therapeutics.
‍

INTRODUCTION

SARS-CoV-2, a novel coronavirus, emerged in December 2019 in Wuhan, China.

Over the following months the resulting disease, subsequently named COVID-19, spread across the rest of the world and was declared a pandemic by the World Health Organization on March 11th, 2020. The COVID-19 pandemic has impacted millions worldwide and has the potential to cause a worldwide recession. At the current moment, there are no available vaccines.

Recursion, a digital biology company industrializing drug discovery, conducted several experiments in April 2020 to investigate therapeutic potential for COVID-19 from a library of FDA-approved drugs, EMA-approved drugs or compounds in late-stage clinical trials for modulation of the effect of SARS-CoV-2 on human cells. The resulting experiments were then compiled into the RxRx19a dataset, which is composed of 305,520 images and corresponding deep learning embeddings at nearly 450 gigabytes of data. RxRx19a provides the largest publicly available set of human cellular morphological data to researchers around the world who are working to make advances in the fight against the COVID-19 pandemic.

THE BIOLOGY

To create a human SARS-CoV-2 model suitable for Recursion’s drug discovery platform, we infected monolayers of normal human renal cortical epithelial cells (HRCE) and human bronchial epithelial cells.

The cells were then fixed, stained and imaged at 96 hours post-infection. African green monkey kidney epithelial cells (Vero) were also infected as a control condition. The HRCE and Vero cells both demonstrated robust phenotypes compared to the mock and irradiated controls. HRCEs were selected for further high-throughput screening due to their disease relevance and robust disease-specific phenotype.

Chemical Suppressor screens were conducted by treating HRCE cells in six half-log doses with six replicates per dose for each compound approximately two hours after cell seeding (concentrations tested may vary for certain reference compounds studied). At 24 hours post-seeding, cells were infected with SARS-CoV-2 and incubated for 96 hours until fixation, staining and imaging. Recursion then evaluated 1,672 compounds in HRCE and referenced compounds in both HRCE and Vero using fluorescent microscopy images of five channels that illuminate different organelles of the cell. Images were processed using Recursion’s proprietary deep learning neural network to generate high-dimensional featurizations of each image for the identification of distinct phenotypic profiles.

THE DATA

RxRx19a consists of 305,520 fluorescent microscopy images and their deep learning embeddings. Each image is 1024x1024x5.

RxRx19a is the first morphological dataset that demonstrates the rescue of morphological effects of COVID-19. Through RxRx19a, researchers in the scientific community will have access to both the images and the corresponding deep learning embeddings to analyze or apply to their own experimentation. The embeddings are 1024-dimensional vectors with one vector for each image and come from Recursion’s internal model trained on additional cell types and perturbation modalities. We provide these embeddings to more easily enable researchers without significant compute resources to still explore and uncover insights from this data. Scientific researchers can use the data to further demonstrate how high-content imaging can be used for compound efficacy screening. Results and conclusions drawn from the in vitro experiments and targeted hypothesis-driven research will contribute to the growing body of scientific data in the fight against COVID-19.

The RxRx19a dataset is highly similar in nature to RxRx1a, a dataset previously released by Recursion in June 2019, although there are some key differences. For ease of comparison and understanding, we provide the following table highlighting the primary differences:

Release Date

June 2019

August 2020

April 2020

August 2020

January 2023

Cell Types

HUVEC
RPE
U2OS
HepG2

HUVEC

HRCE
Vero

HUVEC

Stains (Channels)

Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA

Hoechst
ConA
Phalloidin
Syto14
WGA

Hoechst
ConA
Phalloidin
Syto14
MitoTracker
WGA

Plate Density

384-well

1536-well

Imaging Sites per Well

Perturbations Evaluated

1,138 siRNAs

434 soluble factors at 6 concentrations

1,672 small molecules at 6+ concentrations
Three viral conditions (active virus, irradiated, mock)

1,856 small molecules at 4-6 concentrations in three COVID-19-associated cytokine storm conditions (severe storm, healthy, and no cytokines)

17,063 CRISPR/Cas9-mediated gene knockouts
1,674 compounds at 8 concentrations each

Total Number of Images

125,510

131,953

305,520

70,384

~2.2M

Image Dimension

512x512x6

1024x1024x6

1024x1024x5

2048x2048x6

Compressed Dataset Size

~46GB

~185GB

~450GB

~409GB

~83,100GB

License

CC-BY-NC-SA

CC-BY

CC-BY-NC-SA

Download the Dataset

LICENSE

This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.

Note, that this license applies only to the RxRx19a dataset, not RxRx1.

DOWNLOAD

The dataset is available in three different parts so you can download only the part(s) that you are interested in. The deep learning embeddings provide an easy way to explore the dataset without downloading the images.

‍

Metadata

A CSV containing the experiment design, e.g. what cell type and treatment are in each well. The schema is provided in the README.

3.6 MB Download

Deep Learning Embeddings

A large CSV file containing all of the deep learning embeddings for each image.

1.4 GB Download

Images

1,527,600 8-bit PNG 1024x1024 images. These images are downsampled from the original 2048x2048 16-bit versions. The directory structure is explained in the README.

444 GB Download

Stay informed about RxRx datasets & models

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Connect on social media