Invited Speakers
Lassi Paavolainen
Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki
Lassi Paavolainen (Ph.D.) is a Postdoctoral Fellow at the Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, and President of the CytoData Society. His current research focus is in deep convolutional neural networks (CNN) and their applications to image-based profiling. He studies how CNN's can be utilized in learning unbiased representation of microscopy images of cells and tissues from large collections of image data. The results of these studies are applied in cancer research. In addition to CytoData activities, Dr. Paavolainen is currently representative of Finland in the management committee of COST project Network of European Bioimage Analysts (NEUBIAS) and Information Officer in the board of Nordic Microscopy Society (SCANDEM).

Dr. Paavolainen obtained his PhD in 2013 from University of Jyväskylä, Finland, while working in Varpu Marjomäki's research group. During his PhD, he focused on image analysis method development for fluorescence microscopy and electron microscopy image data, and worked as the lead software engineer in open-source BioImageXD software project. Starting from 2015 he has worked at FIMM as a postdoctoral researcher in Peter Horvath's and Olli Kallioniemi's research groups. His postdoctoral research work focuses on high-content image analysis of patient-derived cancer cells. In 2017, Dr. Paavolainen co-created FIMM High-Content Imaging and Analysis core unit (FIMM-HCA) and worked as the first Head of the Unit.

CytoData Society and recent advances in image-based profiling
Image-based profiling is an emerging research field combining image analysis, data science and biology to explore various biological phenomena from vast amounts of imaging data. Since the publication of data-analysis strategies review article (Caicedo et al., 2017) by CytoData Society members in 2017, deep learning has revolutionized many aspects of the profiling workflow, mainly segmentation of regions-of-interest and learning of complex and unexplored associations from the data. Recently published open datasets in repositories such as Broad Bioimage Benchmark Collection, Image Data Resource and RxRx have also impacted deep learning applications. Image-based profiling has reached a level of being utilized, not only in academia, but also in industry including most large pharma companies for drug discovery. However, continuous developments of technologies such as 3D cell cultures and highly multiplexed imaging (imaging mass cytometry, cyclic immunofluorescence) provide additional informatics challenges to the research community.

The first CytoData event (Cytomining Hackathon) was organized in 2016 at Broad Institute by Carpenter lab and gathered together 28 participants from invited labs. The first CytoData Symposium and friendly competitive hackathon was organized in 2017 by Bakal lab in London. Rapidly growing CytoData Society was founded early 2018 and today the number of members has already reached 200. The CytoData society builds and maintains an active community around image-based profiling of biological phenotypes induced by genetic, chemical or other perturbations of biological systems. The society's mission is fulfilled by annual symposia, hackathons, and interconnecting members in both academia and industry. Early career researchers have been especially important from the beginning of the society and we aim to continue educating the next generation. After CytoData2020 the society will launch a monthly webinar to disseminate recent knowledge in image-based profiling to the community.
Ben Fogelson
Ben is a senior data scientist on the drug discovery team at Recursion Pharmaceuticals. Prior to that, he was a mathematical biologist at the University of Utah, where he studied problems in cell biomechanics. Ben has written about his own work and reported on other scientific breakthroughs for Scientific American. Ben has a PhD in mathematics from the Courant Institute of Mathematical Sciences at New York University.
Deep-learning-enabled phenomics applied to COVID-19 drug discovery
The COVID-19 pandemic has upended our world and made the rapid discovery of pharmacological treatments a top global priority. Unfortunately, current preclinical systems for studying both the SARS-CoV-2 virus and COVID-19 disease have a high risk of failure in human translation, with most in vitro work being done in an immortalized African green monkey cell line (Vero E6) and few in vivo models in the early phase of the pandemic. Moreover, many of COVID-19’s terminal symptoms are not directly caused by viral infection but rather by an out-of-control inflammatory response.

In this talk, I describe recent work at Recursion to discover therapeutics for both SARS-CoV-2 infection and for COVID-19-associated cytokine storm. Recursion applied deep-learning-based image AI to rapidly develop human-cell-based assays for both of these conditions, applying phenotypic screening to deliver value despite a lack of specific knowledge regarding virus-host interaction biology and key inflammatory pathways. We deployed these assays to screen thousands of approved and clinical compounds for therapeutic benefit with minimal staff in only a few weeks. I demonstrate the utility of Recursion’s platform by discussing recent results from our screening work.
Brandon White
Spring Discovery
Brandon is a Product Lead at Spring Discovery. He was formerly Senior Product Manager at Freenome and Senior ML Engineer at Uber.
Deep-learning-enabled phenomics applied to COVID-19 drug discovery
Age-related immune dysregulation contributes to increased susceptibility to infection and disease in older adults. We combined high-throughput laboratory automation with machine learning to build a multi-phenotype aging profile that models the dysfunctional immune response to viral infection in older adults. From a single well, our multi-phenotype aging profile can capture changes in cell composition, physical cell-to-cell interaction, organelle structure, cytokines, and other hidden complexities contributing to age-related dysfunction. This system allows for rapid identification of new potential compounds to rejuvenate older adults’ immune response. We used our technology to screen thousands of compounds for their ability to make old immune cells respond to viral infection like young immune cells. We observed beneficial effects of multiple compounds, of which two of the most promising were disulfiram and triptonide. Our findings indicate that disulfiram could be considered as a treatment for severe coronavirus disease 2019 and other inflammatory infections. We have kicked off a phase 2 clinical trial to test the effects of disulfiram in patients with moderate covid-19.
Eric Durand
Novartis Institutes for BioMedical Research
Eric obtained his PhD from the Grenoble Institute of Technology, France, where he developed Bayesian statistics methods for population genetics. Then, he undertook postdoctoral studies at UC Berkeley, CA, during which he contributed to the analysis of the first draft of the Neanderthal and Denisovan genomes. He moved to 23andMe, inc. in 2011 where he developed Ancestry Composition that became the most comprehensive ancestry inference product available to the public. He joined the Novartis Institute of Biomedical Research in 2015, working on deep learning approaches to better characterize drug response in oncology. Since 2019, he leads the Basel oncology data science group, which collaborates with researchers and clinicians across the entire drug development cycle, from target identification to late stage clinical trials.
Fully unsupervised deep mode of action learning for phenotyping high-content cellular images
The identification and discovery of phenotypes from high content screening (HCS) images is a challenging task. Earlier works use image analysis pipelines to extract biological features, supervised training methods or generate features with neural networks pretrained on non-cellular images. We introduce a novel fully unsupervised deep learning algorithm to cluster cellular images with similar Mode-of-Action together using only the images’ pixel intensity values as input. The method outperforms existing approaches on the labelled subset of the BBBC021 dataset and achieves an accuracy of 97.09% for correctly classifying the Mode-of-Action (MOA) by nearest neighbors matching. One unique aspect of the approach is that it is able to perform training on the entire unannotated dataset, to correctly cluster similar treatments beyond the annotated subset of the dataset and can be used for novel MOA discovery.
Paula A. Marin Zapata
Bayer AG
Paula received her BSc degree in Biological Engineering from the National University of Colombia in Medellin on 2008. On 2010, she obtained a MSc on Applied Mathematics from Eindhoven University of Technology in Eindhoven (the Netherlands) under the sponsorship of a Tu/e Talent Scholarship. After working for two years in mathematical consultancy at Sioux-LIME in Eindhoven, she enrolled at German Cancer Research Center - Heidelberg University in Heidelberg (Germany), where she obtained a PhD in Biology on 2016. During her PhD, she focused on the development of mathematical models of cell signaling pathways and data-driven approaches to analyze high content images. On 2017, she joined Bayer AG in Berlin (Germany) as a postdoctoral fellow, where she developed deep learning methods for phenotypic profiling in plant sciences and cellular images. Paula joined the Machine Learning Research group from Bayer R&D on 2019, where she focuses on image analysis applications to drug discovery.
Cell morphology-guided de novo hit design by conditioning generative adversarial networks on phenotypic image features
Developing new bioactive compounds is time-consuming, costly and rarely successful. As a mitigation strategy, we combine, for the first time, information-rich phenotypic assays with generative adversarial networks (GANs) to approach the de novo design of small molecules. We train a GAN conditioned on morphological profiles from Cell Painting images in order to generate compounds that induce specific morphological effects. Comparison of generated molecules with known bioactives provide a first evidence for the applicability of our approach to targeted generation of small molecules. We envision that this proof-of-concept will encourage research on systematic molecule design based on high-content assays.
Shantanu Singh, Juan Caicedo, Gregory Way
The Broad Institute of MIT and Harvard
Shantanu Singh leads a data science group at Broad Institute's Imaging Platform. His research focuses on using images to understand human diseases and find cures.
Juan Caicedo is a Schmidt Fellow at the Broad Institute. He is pioneering the use of deep learning and machine learning methods to analyze microscopy images and high-resolution genetic data.
Gregory Way is a postdoc in the Imaging Platform at the Broad Institute. He is a biomedical data scientist interested in reducing human suffering by extracting value, maximizing knowledge, and integrating signals from various kinds of biology data types.
Fortune-telling with Images: Cell Painting for discovering drugs
Cell microscopy images contain a vast amount of information about the status of the cell. This information can be used to make valuable predictions for discovering new cures for diseases. I’ll describe our image-based fortune-telling efforts at the Broad, where we use Cell Painting to predict drug resistance, readouts of complex assays, cell health, and new indications for drugs.
Réka Hollandi
Biological Research Centre (BRC), Szeged, Hungary
Réka works on single cell analysis of high-content screening projects, researches and developes deep learning-based methods and tools for them and connects them to biologists.
nucleAIzer: nucleus segmentation with deep learning and image style transfer
Cellular analysis based on microscopy images starts with the identification of cells, typically by segmentation. This challenges researchers to construct out-of-the-box solutions that potentially work in various experiments as downstream analysis depends on segmentation reliability. nucleAIzer is a deep learning-based pipeline intended for the efficient and robust instance segmentation of cellular compartments, even on such new image modalities for which no ground truth data is available by adaptation to them via image style transfer learning. With this technique we can generate synthetic images in the new, unknown experiments' domain and forward this information to the training of a segmentation model, thus preparing it to cope with such images.
Alan Moses
University of Toronto
Alan M Moses is a Professor and Canada Research Chair in Computational Biology in the Departments of Cell & Systems Biology and Computer Science at the University of Toronto. His research touches on many of the major areas in computational biology, including DNA and protein sequence analysis, phylogenetic models, population genetics, expression profiles, regulatory network simulations and image analysis.
Towards generalizable image analysis for systematic cell biology
Given the diversity of imaging platforms and experimental designs, customized computational approaches are still required for most large-scale microscopy image analysis. In this talk I will outline our efforts to develop generalizable tools and concepts that we believe can be applied "out of the box" to new data. Using our work on protein localization in single-cell microscopy as an example, I will highlight challenges with the standard supervised classification paradigm, and suggest unsupervised analysis in (ideally, biologically interpretable) image feature spaces as an alternative.
Anne Carpenter
The Broad Institute of MIT and Harvard
Dr. Carpenter is an Institute Scientist at the Broad Institute of Harvard and MIT. Her research group develops algorithms and strategies for large-scale experiments involving images, including the software CellProfiler and the assay Cell Painting. Carpenter is a pioneer in image-based profiling, the extraction of rich, unbiased information from images for a number of important applications in drug discovery and functional genomics. Carpenter has been named an NSF CAREER awardee, an NIH MIRA awardee, a Massachusetts Academy of Sciences fellow (its youngest at the time), a Genome Technology “Rising Young Investigator”, and is listed in Deep Knowledge Analytics’ top-100 AI Leaders in Drug Discovery and Advanced Healthcare. She serves on the Scientific and Technical Advisory Board for Recursion.
Generating public images for profiling: the JUMP-Cell Painting Consortium
Ten pharmaceutical companies and three non-profits have come together to form the Joint Undertaking for Morphological Profiling (JUMP)-Cell Painting Consortium. Our goal is to develop and share experimental and computational best practices for image-based profiling and to create a public image-based Cell Painting dataset of over 150,000 genetic and chemical perturbations. The Consortium's latest results and public resources will be discussed.
James Taylor, Katie-Rose Skelley, Katie Heiser
Recursion, Known Medicine
James Taylor received a PhD in mathematics from Brigham Young University, applying mathematics to problems in economics, finance, and imaging. He is currently a senior data scientist at Recursion Pharmaceuticals.
Katie Heiser has her PhD in molecular and cell biology from the University of Colorado at Boulder, with specialties in virology and optical imaging. She is currently a research scientist in the biology department at Recursion Pharmaceuticals. She leads assay development and drug discovery efforts on multiple disease programs, including the COVID-19 drug repurposing project.
Katie Heiser has her PhD in molecular and cell biology from the University of Colorado at Boulder, with specialties in virology and optical imaging. She is currently a research scientist in the biology department at Recursion Pharmaceuticals. She leads assay development and drug discovery efforts on multiple disease programs, including the COVID-19 drug repurposing project.
In June 2019, Recursion Pharmaceuticals released RxRx1, 296 GB of images of siRNA-treated cells, intended to kickstart a flurry of innovation in machine learning on large biological datasets. This dataset was designed to test machine learning algorithms for robustness against controlled biological variation. In August 2020, RxRx2 was released containing images of 434 immune stimulants in primary human cells. These data highlight how Recursion’s platform can cluster extremely complicated perturbations and cell signaling pathways. Finally, in April and August 2020 respectively, images from COVID-19 drug repurposing efforts were released in  RxRx19a and RxRx19b. These four datasets were released in order to drive innovation in the field of machine learning-based drug discovery, facilitate development of novel techniques and aid therapeutic advancement, especially against the current COVID-19 global pandemic.
Kyle Brimacombe
National Institutes of Health
Kyle Brimacombe, M.S. is a program manager in the Early Translation Branch at the National Institutes of Health’s (NIH’s) National Center for Advancing Translational Sciences (NCATS).  Kyle trained as an assay biologist conducting high-throughput screening as part of the Molecular Libraries Initiative, and he now helps lead several translational science initiatives within NCATS, including the NCATS OpenData Portal for COVID-19.  Kyle received his undergraduate degree with honors from Miami University (Oxford, OH) in Zoology, with a minor in Molecular Biology, as well as a Master’s degree in Biotechnology from Johns Hopkins University. As part of his graduate work, he was awarded a JHU/NCI Molecular Targets and Drug Discovery Technologies Fellowship and worked under Dr. Michael M. Gottesman at the National Cancer Institute Laboratory of Cell Biology, where he developed dual-fluorescence cell-based assays to conduct high-throughput screening for substrates and modulators of multidrug resistance efflux transporters.
Drug Repurposing for COVID-19 through the NCATS OpenData Portal
The COVID-19 pandemic has significantly impacted global society, prompting a rapid research response from biomedical scientists around the world in a race to understand the disease, and to develop therapeutic interventions.  Though this response saw a wider adoption of open science practices, many published reports focused solely on active hits, and opted not to disclose the majority of tested compounds that were inactive.

However, this information is critical for understanding and validating disease and drug mechanisms-of-action, and for nominating repurposed and novel clinical lead candidates. To address this, the National Center for Advancing Translational Sciences (NCATS) developed an online open science data portal for COVID-19 drug repurposing campaigns – named OpenData – with the goal of making data and experimental protocols across a range of SARS-CoV-2 related assays available quickly and freely.  This approach allows researchers rapid access to drug repurposing datasets that can support subsequent mechanistic study of compounds and aid in the building of new predictive models. The OpenData Portal currently hosts single agent and drug combination screening data for a panel of biochemical and cell-based assays, and current efforts are focused on integrating and visualizing high-content cell-based screening data for SARS-CoV-2 on the OpenData Portal with the same open-minded approach.
Petr Votava
Petr is Director of AI/ML Platform engineering in GSK responsible for the development of training and inference platform. He was previously Principal Architect in Genentech/Roche responsible for high-performance computing and cloud systems. Prior to life sciences career, Petr spent many years at NASA as Sr. Software Engineer with focus on machine learning and satellite image pipelines capable of processing 10’s of PBs at a time.
Elements of Scalable and Reusable Data Pipelines for AI/ML and Beyond
This talk will cover some of the components that make (image) data processing pipelines scalable, reusable, and reproducible. We will look at architectures from hardware to file formats to metadata to tie it all together, and will give examples on how we are implementing these at GSK.
Travis Martin, Teresa Anderson-Meyers, Peter McLean
Travis received a PhD in Computer Science from University of Michigan and a BS in Computer Science & Mathematics from Rice University. Before Recursion, he was in Seattle working as a software engineer for Google. He currently works as a Senior Software Engineer, and is building infrastructure for storing & analyzing Recursion's data on the Data Engineering team.
Teresa received a BS in Biology from the University of Utah and a BS in Computer Science from Weber State University.  She has worked as a software engineer for the past 6 years is now a Senior Software Engineer at Recursion where she loves working on the Experiment Design engineering team.
Phenomic data at scale
Recursion aims to streamline drug discovery by building a massive, relatable cellular imaging dataset to enable actionable insights on human biology.  We will present a brief overview of our data pipeline and how we have balanced streamlining it to handle millions of images per week while keeping it flexible enough to handle diverse perturbations that allow us to quickly research new diseases, cell types, treatments, and more.

Contributed talks
Osheen Sharma
Science for Life Laboratory, Stockholm-Sweden
Immunofluorescence stained segmentation of Cancer Epithelial using Nuclei Morphology
The analysis of histopathological images for a quantitative and reproducible result is a very challenging task for human observers. The traditional approach has been to use Hematoxylin and Eosin (H&E) stained images to segment the nuclei which limits the information that can be retrieved for the tissue/cell samples. Hence, a new method is proposed in my master thesis work to segment cancer epithelial tissue regions using nuclei morphology only, captured with Immunofluorescence (IF) staining. The bladder cancer (BCa) tissue samples are stained for nuclei (DAPI) and a marker (cocktail of e-cadherin and pan-cytokeratin c11 and ae1/ae3) for cancer epithelial (panEPI) which was used as ground truth for training the neural networks. Three popular neural networks were trained namely: U-Net, Residual U-Net and VGG16. Additionally, the transfer learning approach was tested with the VGG16 model which was pretrained on the ImageNet dataset to enable much quicker model training.  

Due to high resolution of histopathological images, tiles were created to divide the images into small patches. The results showed promising performance by Residual U-Net with the highest dice accuracy of 90.07% on test set and 86.27% on validation set. The ability to accurately distinguish between nuclei morphology and cancer epithelial from a limited amount of dataset shows the benefit of deep convolutional neural networks to digitally augment the IF marker panels and therefore offer improved resolution of molecular characteristics for research settings.
Jeremy Grignard
Institut de Recherches Servier
An end-to-end pipeline to normalize and maximize the phenotypic information from high-content data for drug screening applications
In the context of an industrial imaging-based profiling campaign, there is no standard, systematic and agnostic method to identify positive control compounds. To fully benefit from the tremendous capabilities of high-content screening (HCS) for either target or untargeted drug discovery assays, it is critical to automate the selection of suitable controls, and this to meet two overriding objectives: - normalize HCS data from thousands of screening plates to allow reliable downstream analysis; - maximize the phenotypic information carried by a large number of cellular features, thus enabling the analysis of known and yet unknown phenotypic responses.

Here, we developed a three steps end-to-end pipeline to normalize and maximize the phenotypic information to exploit from HCS data: (i) an internal library, made up of 27 chemical compounds, identified to have various and characterized phenotypic effects, is screened in dose responses; (ii) an algorithm has been designed to identify a combination of 4 compounds at one dose each that maximize the phenotypic information; (iii) the selected control compounds are used in a lowest dimensional phenotypic space as reference points to normalize HCS data from each screening plates.

The pipeline has been tested and validated on two internal projects. Overall, the proposed approach shows promising results by indicating that the positive controls selection automation is achievable and explainable, and that the normalization step is efficient.

The presentation will describe the controls selection algorithm as well as various aspects of the validation, and finally highlight critical need for such an approach within a long-term drug discovery pipeline.
Loan Vulliard
Max Perutz Labs Vienna
Understanding Chemical-Genetic Interactions: Morphological screens of pairwise perturbations
To maintain homeostasis, biological systems require a careful balance between diverse cellular components and their interactions. Disease states, but also therapeutic interventions, can be understood as perturbations of this system, either driving it away from homeostasis, or aiming to restore it. Yet, the key principles that govern how combinations of internal and external perturbations converge into a combined response are still to be uncovered. High-content screening allows the systematic exploration of such combinations, which can then be interpreted thanks to the tools of network biology. We therefore aim for the construction and analysis of high-resolution directed interaction networks describing context-dependent drug response.

We propose a combined theoretical and experimental approach, with an arrayed morphological screen combining genetic and chemical perturbations in SK-N-AS, a human neuroblastoma cell line. Specifically, we will quantify the changes that 350 chemical compounds and 240 CRISPR gene knockouts induce in cell morphology, based on fluorescence microscopy images. Comparing the effects of genes and drugs individually with the effect of all 84000 possible pairwise combinations allow us to identify interactions between them. By systematically mapping out and characterizing these interactions we expect to (i) identify rules for the emergence of drug-gene interactions, (ii) obtain new insights into cell-type specific responses and (iii) deepen our understanding of how different perturbations are processed by the molecular network of the cell.

Using a vector-based approach, we can obtain a detailed landscape of directed interactions between internal and external perturbations. This will result in a perturbation interaction network that can then be explored and interpreted through the prism of the interactome. Finally, we will characterize and interpret the resulting map using diverse molecular and phenotypic datasets, ranging from functional gene annotations to known disease associations to increase our system-wide understanding of health and disease.
Yannick Berker
German Cancer Research Center (DKFZ)
Patient-by-patient deep transfer learning for 3D-image-based drug response profiles in pediatric tumor cells
Introduction: In pediatric oncology, drug screening of primary tumor material can improve personalized therapies. Compared to metabolic end-point measurements, image-based profiling provides higher-dimensional read-outs from fewer cells and promises additional insight based on phenotypic features. We investigate if deep learning can provide a quickly implementable solution for functional image-based drug-response analysis of 3D tumor cell cultures for clinical translation.

Methods: Tumor cells are screened against a drug library of 75 drugs in five concentrations each and imaged by high-throughput confocal microscopy after a no-wash nuclear staining protocol. For image classification, images of positive (STS) and negative (DMSO) cell death controls acquired from control cell lines are used to fine-tune an ImageNet-pretrained VGG16 image-classification network (phase I); patient-specific in-plate controls provide further specific fine-tuning opportunities (phase II). Wells are assigned STS-likeness (percentage STS-like cell death) scores, which are used to compute drug sensitivity scores indicating the effectivity of each drug to kill tumor cells across the applied concentration range.

Results: Validation losses show the importance of both phase-I and phase-II fine-tuning. Label smoothing prevents extreme network outputs compared to no label smoothing. STS likeness values of controls show low variability, while STS-likeness scores for the drug library range in-between. Correlation between imaging replicates indicates acceptable reproducibility. Imaging-based drug scores are confirmed by visual interpretation. Notable mismatches between metabolic and imaging-based drug scores may hint at different mechanisms underlying metabolic and imaging-based read-outs, potentially indicating opportunities for drug combinations.
Conclusion: Deep transfer learning with patient-by-patient fine-tuning allows quick implementation of microscopy-based image-quantification pipelines for high-throughput drug response profiling of 3D cell cultures. Beside CNN-based STS-likeness, we apply deep learning to investigate inter- and intra-entity specific heterogeneities, to classify tumor vs non-tumor cells and drug specific phenotypes. We further explore multi-dye imaging and 3D single-cell analysis for phenotypic drug profiling.
Heba Sailem
University of Oxford
Image Analysis for Phenotyping Vascular Networks
Angiogenesis plays a key role in several diseases including cancer, ischemic vascular disease, and Alzheimer’s disease. Angiogenesis is driven by endothelial cells that have an amazing ability to self-organise into endothelial networks. Multiple imaging modalities can be used to image vascular network both in tissue cultures and in vivo. For example, using photoacoustic molecular imaging enable capturing haemoglobin flow in tumours and characterising vascular functions. However, the analysis of the resulting imaging datasets has been limited to a few phenotypic features such as the total tube length or the number of branching points. Here we developed an image analysis framework for detailed quantification of various aspects of vascular network morphology including network complexity, symmetry and topology. We apply our approach to both 2D endothelial networks and 3D in vivo tumour vascular networks. By applying our approach to a high content screen of 1,280 characterised drugs, we found that drugs that result in a similar phenotype share the same mechanism of action or common downstream signalling pathways. Our multiparametric analysis revealed a group of drugs, that target glutamate receptors, result in enhanced branching and network connectivity. Using an integrative meta-analysis approach, we validated the link between these receptors and angiogenesis. We further found that the expression of these genes is associated with the prognosis of Alzheimer’s patients. In conclusion, our work shows that detailed image analysis of complex endothelial phenotypes can reveal new insights into biological mechanisms modulating the morphogenesis of endothelial networks and identify potential therapeutics for angiogenesis-related diseases.
KCML: a machine-learning framework for inference of multi-scale gene functions from image-based genetic screens
Characterising context-dependent gene functions is crucial for understanding the genetic bases of health and disease. To date, inference of gene functions from large-scale genetic perturbation screens is based on ad hoc analysis pipelines involving unsupervised clustering and functional enrichment. We present Knowledge- and Context-driven Machine Learning (KCML), a framework that systematically predicts multiple context-specific functions for a given gene based on the similarity of its perturbation phenotype to those with known function. As a proof of concept, we test KCML on three datasets describing phenotypes at the molecular, cellular and population levels and show that it outperforms traditional analysis pipelines. In particular, KCML identified an abnormal multicellular organisation phenotype associated with the depletion of olfactory receptors, and TGFb and WNT signalling genes in colorectal cancer cells. We validate these predictions in colorectal cancer patients and show that olfactory receptors expression is predictive of worse patient outcomes. These results highlight KCML as a systematic framework for discovering novel scale-crossing and context-dependent gene functions. KCML is highly generalisable and applicable to various large-scale genetic perturbation screens. We envision that KCML will transform how microscopists analyse their images to extract biological knowledge.
Tannia Lau
UC Santa Cruz
Deep learning for the detection of potential anti-inflammatory drug leads
Upon pathogen invasion the first line of defense is orchestrated by the rapid and dynamic inflammatory response of the innate immune system, although necessary and critical for survival, this highly regulated response can be uncontrolled or excessive and drive disease. Unfortunately, our current supply of approved anti-inflammatory medicine is very limited and only treats a small fraction of inflammatory diseases. In order to address this issue, we developed a high-content image-based screen that utilizes a pro-inflammatory stimulus lipopolysaccharide (LPS) and macrophages (Raw264.7) to capture bioactive compounds that are involved in pathways related to the innate immune response. We screened for compounds (~4,000 from a library of drugs and inhibitors with annotated MOAs and natural product fractions) that reverse the LPS-induced phenotype in macrophages as potential anti-inflammatory drug leads. We utilize cytological profiling techniques based on a limited staining set of seven probes (two stain sets total) for cell cycle (S-phase, mitosis), organelles (nuclei, Golgi, mitochondria), and the cytoskeleton (actin, tubulin). The generated images are used to train a deep neural network classifier on non-inflamed (- LPS) and inflamed (+LPS) macrophage controls. We ran ablation studies on the image size and different model architectures, including newly developed siamese architectures as well as state-of-the-art image classification baseline models. In summary, most of the varying models and image resolutions are optimal with an AUC value of 1. Those that performed best are the models trained on full images which hints that context is more important than details for this classification task. Furthermore, we could show that all stains contain valuable information since the best classification performance is achieved when combining all features in a respective model architecture. When testing the model on our image dataset of compounds +LPS, a visual inspection of resulting hits confirms the reversal of the LPS phenotype. Compounds that exhibit the non-inflamed phenotype are selected for dose-response and cytokine assays based on high potency, selectivity, and desirable pharmacokinetic properties. This approach has the potential to elucidate the MOAs of novel natural products relevant to inflammation and accelerate the pace of drug discovery in this therapeutic area.
Assaf Zaritsky
Ben-Gurion University of the Negev
Interpretable deep learning of label-free live cell images uncovers hallmark states of highly-metastatic melanoma
Deep learning has emerged as the technique of choice for identifying hidden patterns in cell imaging data, but is criticized for the lack of insights it provides on the machine’s prediction. Here, we demonstrate that a generative adversarial neural network captures subtle details of cell appearance that permit the prediction of the metastatic efficiency of patient-derived melanoma xenografts that reflect clinical outcome. We used the network to generate “in-silico” cell images that amplified the cellular features critical for the prediction. These images unveiled pseudopodial extensions and increased light scattering as hallmark states of metastatic cells. We validated this interpretation using live cells spontaneously transitioning between states indicative of low and high metastatic efficiency. Together, this data demonstrates how the application of Artificial Intelligence can support the identification of processes that are essential for the execution of complex integrated cell functions but are too subtle to be identified in the raw imagery by a human expert.
Assaf Nahum
Ben-Gurion University of the Negev
Quantifying the dynamics of long-range cell-cell mechanical communication
Cells feel, respond and can even remember the mechanical properties of the microenvironment. We developed a computational method to systematically measure mechanical cell-ECM interactions and long-range mechanical cell-cell communication through the extracellular matrix (ECM) and demonstrated it with simulations and 3D live imaging of fibroblasts embedded in fibrin gels. This was achieved by correlating the ECM-remodelling fluctuations of communicating cells and demonstrating that these fluctuations contain sufficient information to robustly distinguish between different pairs of communicating cells. We present two generic applications for our method. First, the ability to systematically measure the unique ECM remodelling patterns of a specific pair of communicating cells. This comes in contrast to previous studies that focused on the formation of a visible fibrous ‘band’ between communicating cells and lacked any ability to distinguish which cells are actually communicating from the many cells that have the potential to communicate, limiting the possibility to infer the complex tangle of cell-cell interactions in complicated environments. Second, we deciphered asymmetric interactions between the communicating cell pair by demonstrate that our method can identify leader-follower relationships between communicating cells. Both applications were demonstrated with extensive simulations and 3D live cell imaging. The simulations were critical for proper and extensive assessment of our method by controlling the various parameters independently in order to test and verify the sensitivity of our approach. In experiments, we used standard confocal imaging, available in almost any academic institute, highlighting the potential of being widely adapted and democratizing cell-ECM-cell communication quantification. Our method sets the stage to measure the fundamental aspects of intercellular long-range mechanical communication in physiological contexts and may provide a new functional readout for high content 3D image-based screening, specifically in diseases where the mechanical properties of the microenvironment are established hallmarks such as cancer metastasis and fibrosis.
Brodie Fischbacher
The New York Stem Cell Foundation
Modular deep learning enables automated identification of monoclonal cell lines
Monoclonalization refers to the isolation and expansion of a single cell derived from a cultured population. This is typically done with the aim of minimizing a cell line’s technical variability downstream of cell-altering events, such as reprogramming or gene editing, as well as for monoclonal antibody development. Without automated, standardized methods for assessing clonality post-hoc, methods involving monoclonalization cannot be reliably upscaled without exacerbating the technical variability of cell lines. We report the design of a deep learning workflow that automatically detects colony presence and identifies clonality from cellular imaging. The workflow, termed Monoqlo, integrates multiple convolutional neural networks and, critically, leverages the chronological directionality of the cell culturing process. Our algorithm design provides a fully scalable, highly interpretable framework, capable of analyzing industrial data volumes in under an hour using commodity hardware. Monoqlo standardizes the monoclonalization process, enabling colony selection protocols to be infinitely upscaled while minimizing technical variability.
Jianxu Chen
Allen Institute for Cell Science
Biologically Accurate 3D Cell and Nuclear Segmentation at Scale via Combining Training Assay and Iterative Deep Learning Approaches
Deep neural networks have been widely used for segmentation in microscopy images and have achieved great success for some problems too difficult to tackle by traditional image processing techniques. Regardless of the deep learning models (e.g., U-Net, Mask-RCNN, StarDist, etc.), the most accurate segmentation in general is still achieved by training with large sets of data with target segmentations (usually referred as ground truth). These ground truths are commonly created by manual annotation of the pixels or voxels in images. For 3D images this annotation process is extremely time-consuming and the annotated shape lacks spatial smoothness, especially when the shape has complex morphology. More importantly, the manual annotation may be significantly different from the biologically correct ground truth. Segmentation obtained from models trained with such ground truths will be problematic for biological research where the absolute accuracy matters. In this work, we will present two methods: (1) Training Assay and (2) iterative deep learning. The Training Assay approach is a general computation-experiment co-design concept that can help creating more biologically correct segmentations. Iterative deep learning is a workflow introduced in the Allen Cell Structure Segmenter and specifically designed for building training data without the need for extensive manual annotation of segmentation targets and requiring only very limited human intervention. We combined the iterative deep learning and Training Assay approaches together with  additional auxiliary algorithms (e.g. mitotic daughter cell pair detection) to create a workflow to segment with high accuracy all instances of cells and nuclei in 3D microscopy images of  tightly packed human induced pluripotent stem cells at scale. This segmentation workflow created  ~220,000 single cell images of 25 different cell lines in the Allen Cell Image Data Collection (based on ~18,000 field of view z-stacks), thus overcoming a fundamental challenge to performing image-based single cell analysis at scale.
Building Computational Transfer Functions on 3D Light Microscopy Images: From a General Deep Learning Toolkit to Biology-driven Validation
Cell and developmental biologists have the difficult task of identifying an optimal, balanced set of appropriate microscopy settings for their specific experiment. They must choose the microscope modality, magnifications, resolution settings, laser powers, etc. that permit the collection of the desired data, something made even more difficult if the desired data involves live imaging. Reducing the types of compromises that have to be made for these experiments permits entirely new types of datasets to be collected and analyzed. For example, if we could computationally transform the images in a long timelapse movie of a large field of view (FOV) at low magnification/resolution  into images with the resolution comparable to enhanced-resolution microscopy images , this would permit analysis of a large colony of cells for a long time at high resolution. Deep learning methods have been developed to achieve transformations between microscopy images, such as image restoration, resolution enhancement, and denoising, but mostly for 2D images. Collecting high quality 3D training data is challenging as it requires pairs of images of identical samples representing the two different types of images to be transferred. In this work, we will present our open source Transfer Functions toolkit composed of two key parts: (1) a 3D registration workflow to align the training image pairs computationally and (2) a general deep learning framework based on the Conditional Generative Adversarial Network (cGAN) including an optional new Auto-Align module for improving the image pair alignment accuracy if computational alignment is not sufficient. We also present several approaches for quantitative, application-specific biology-driven validation of the prediction results. Since the prediction will never be identical to the real target image, this type of validation is crucial to determine whether predicted images generated by deep-learning models such as this Transfer Function toolkit can be used for appropriate biological interpretation.
Jeremy Linsley
Gladstone Institutes
Super-human cell death detection with biomarker-optimized neural networks
Cell death is an essential process in biology that must be accounted for in live microscopy experiments. Nevertheless, cell death is difficult to detect without  perturbing experiments with stains, dyes or biosensors that can bias experimental outcomes, lead to inconsistent results, and reduce the number of processes that can be simultaneously labelled. These additional steps also make live microscopy difficult to scale for high-throughput screening because of the cost, labor, and analysis they entail. We address this fundamental limitation of live microscopy with biomarker-optimized convolutional neural networks (BO-CNN): computer vision models trained with a ground truth biosensor that detect live cells with superhuman, 96% accuracy more than 100 times faster than previous methods. Our models learn to identify important morphological characteristics associated with cell vitality without human input or additional perturbations, and to generalize to other imaging modalities and cell types for which they have no specialized training. We demonstrate that we can interpret decisions from BO-CNN models to gain biological insight into the patterns they use to achieve superhuman accuracy. The BO-CNN approach is broadly useful for live microscopy, and affords a powerful new paradigm for advancing the state of high-throughput imaging in a variety of contexts.
Gregory Way
The Broad Institute of MIT and Harvard
Cytominer: a computational ecosystem supporting reproducible and version-controlled processing of image-based profiling experiments
There is untapped knowledge in fluorescent microscopy images of cells. Measuring cell morphology captures diverse cell states induced by the chosen experimental conditions. A major bottleneck in analyzing these data is a lack of standardized analytical pipelines and poor computational reproducibility. Here, we present the cytominer computational ecosystem as a suite of automated and modular software tools and approaches to process, evaluate, and distribute version-controlled image-based profiling readouts. The input to the cytominer ecosystem are single-cell morphology readouts from software such as CellProfiler or DeepProfiler. The cytominer-database package contains tools for cleaning and curating this data. The profiles are then further standardized using the pycytominer package through single cell aggregation to bulk profiles followed by bulk data normalization and morphology feature selection. The package cytominer-eval then facilitates profile quality evaluation. Configurable “recipes” enable reproducible data processing while empowering concurrent data collection. Pipelines are run within a computational framework and philosophy we call “Data Pipeline Welding”, in which we computationally “fuse” the processing recipe with experimental data; this enforces a stable, version-controlled connection between the experimental data and the pipeline code. Overall, the ecosystem produces morphological profiles that are interoperable and reproducible. It is our goal to present this framework to the community to inspire improvements to the analytical pipelines, and to present reproducible data to the wider biomedical research community.
Therese Pacio
Center for Global Infectious Disease Research, Seattle Children’s Research Institute
CLARITY: A Python-based Image Analysis Pipeline for Subcellular Colocalization Analysis
Innovations in automated high-throughput fluorescence microscopy have enabled the acquisition of large multidimensional datasets comprising images of thousands of cells. The spatial distribution and degree of colocalization of two or more proteins, based on their fluorescence intensities, are useful metrics for phenotyping cells and informing on biological function. Yet, most commercial and open-source image analysis tools report global colocalization statistics at the image-level, and offer limited analysis at the individual cell level. To address this, we are developing a python-based image analysis pipeline to quantify robust per-cell metrics of colocalization. Our workflow includes pre-processing steps that stack image tiff files acquired on a high-throughput automated fluorescent microscopy into multichannel 3D image stacks. The images are then restored via deconvolution algorithms, and cropped to remove out-of-focus image planes using Laplacian variance algorithms. For image thresholding and cell segmentation, the pipeline incorporates scripts obtained from the Allen Institute for Cell Sciences to threshold and segment individual cells. Finally, the pipeline calculates per cell colocalization metrics based on the fluorescent intensity of each voxel in each cell. The automation of per-cell image analysis leveraged in this pipeline will allow for systems-level phenotyping and data mining from fluorescent microscopy images.Innovations in automated high-throughput fluorescence microscopy have enabled the acquisition of large multidimensional datasets comprising images of thousands of cells. The spatial distribution and degree of colocalization of two or more proteins, based on their fluorescence intensities, are useful metrics for phenotyping cells and informing on biological function. Yet, most commercial and open-source image analysis tools report global colocalization statistics at the image-level, and offer limited analysis at the individual cell level. To address this, we are developing a python-based image analysis pipeline to quantify robust per-cell metrics of colocalization. Our workflow includes pre-processing steps that stack image tiff files acquired on a high-throughput automated fluorescent microscopy into multichannel 3D image stacks. The images are then restored via deconvolution algorithms, and cropped to remove out-of-focus image planes using Laplacian variance algorithms. For image thresholding and cell segmentation, the pipeline incorporates scripts obtained from the Allen Institute for Cell Sciences to threshold and segment individual cells. Finally, the pipeline calculates per cell colocalization metrics based on the fluorescent intensity of each voxel in each cell. The automation of per-cell image analysis leveraged in this pipeline will allow for systems-level phenotyping and data mining from fluorescent microscopy images.
Lee Leavitt
University of Utah
Molecular Target Identification using Pharmaconomics
The pharmaconomic platform combines constellation pharmacology (calcium imaging + fluorescent imaging + pharmacology) with transcriptomics. Using diverse heterogenous tissue, we obtain access to many different cell types. Our platform keeps these cells alive while we perform constellation pharmacology to define a facile classification of cell types, even rare types (< 1% of total population). Once the cell types are mapped, we pick up individual cells for single cell transcriptomics. This new combination of information identifies molecular targets with high precision. In this talk I will touch upon how we can identify drug molecular targets, and disease state molecular targets by utilizing these heterogeneous cell cultures.

Sign up for updates about the RxRx datasets
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.