Generative deep computer vision model

Recursion is releasing the first in a potential series of foundation models for external use (both non-commercial and commercial) hosted on NVIDIA’s new BioNeMo platform.

We call this model Phenom-Beta. It flexibly processes microscopy images into general-purpose embeddings. In other words, Phenom-Beta can take a series of images and create a meaningful representation of the input image. This enables robust comparison of images, and other data science techniques to decode any biology or chemistry within such images. This enables scientists to systematically relate genetic and chemical perturbations to one another in a high-dimensional space, helping determine critical mechanistic pathways and identify potential targets and drugs.



1. Recursion’s series of generative deep computer vision models.    
2. Short for phenomenal.


What is Phenom-Beta?

While Phenom-Beta was trained with the RxRx3 Cell Paint images, it is a channel agnostic model meaning that the model can be used on a variable number of channels and any channel order. In practice, Phenom-Beta can be used on other sets of images and other protocols. Use-cases for this model could involve transfer-learning to JUMP-CP, which has 8-channel images (5 are Cell Painting, 3 are Brightfield) and could even be applied to histology images, which have entirely different channel characteristics than the Cell Painting assay.  This means the model could be applied and utilized widely across biological and applications.

In the visual above, observe that the model was trained on 6-channel Cell Painting images. But then it is flexibly being applied to perform inference on a 3-channel image. The model is agnostic to the order of the channels; i.e., the embeddings will be approximately the same regardless of if you pass channels ordered as [0,1, 2] vs [2,1,0].

Image reconstruction is the pretraining task used to train the ViT. Below are illustrative examples of what these ViTs are capable of. ViT-L/8 (RIP-95M) is trained with ~2x the amount of data and over 3x the number of parameters. Note how the texture in the new reconstructions are much more realistic making the input block artifacts far less noticeable.

Model specs - ViT-S/16 backbone

Phenom-Beta is a small Vision Transformer (ViT) with 16x16 patching (/16). It has roughly 25 million parameters and always produces 384-dimensional embeddings as the output (larger ViTs make larger embeddings).

Regardless of the number of channels, C, Phenom-Beta will make a single 384 dimensional embedding for a single C x 256 x 256 input crop. This is by design, as more channels allows more context for the model to produce a single joint representation of the input image.

Example use-case: 700-Gene known biological relationship recapitulation

Identifying relationships between biological entities (e.g., gene-gene interactions arising from protein complexes or signaling pathways) is an important use case for large-scale High Content Screenings experiments based on genetic perturbations. Computing distances (e.g., Euclidean or cosine) between perturbation representations is commonly used as a proxy for relationships, where smaller distance means a stronger relationship.

To quantify this, first, we correct the data for batch effects using Typical Variation Normalization, and also correct for possible chromosome arm biases known to exist in CRISPR-Cas9 HCS data. We compute the aggregate embedding of each perturbation by taking the spherical mean over its replicate embeddings. We use the cosine similarity of a pair of perturbation representations as the relationship metric, setting the origin of the space to the mean of negative controls. We compare these similarities with the relationships found in the public databases of known biological relationships. To measure the significance of the computed similarities, we use a null distribution of 25 million pairwise cosine similarities randomly sampled from the data and consider a relationship significant, if their similarity falls within the x percentiles of the two-sided tails. Finally, we compute the fraction of the relationships that were identified as significant at each threshold, e.g. recall, the higher the recall, the better the model at capturing relationships. For reference, a random baseline would be at 2x for x percentile tails.

In the plots below, the computed recall over a sweep of different thresholds for three public datasets, protein complexes, are shown. Phenom-1 is the top performing model on all three databases, followed by Phenom-Beta. As you can see, a similar ranking is observed when using only the 735 genes that are unblinded in the public RxRx3 dataset.


How do I access Phenom-Beta inference API?

Currently, the model is available through the API and will be available through BioNeMo Beta. To get access, please initiate the process by applying for BioNeMo Beta and then complete the process by signing the Recursion terms & conditions for non-commercial use (link for terms and conditions, link for signing terms and conditions). If you are interested in using Phenom-Beta for commercial use, please contact us.

Stay informed about our RxRx datasets
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Connect on social media