By Peggy I. Wang

Mario Suvà, MD, PhD, is an assistant professor of pathology at Massachusetts General Hospital and Harvard Medical School, an Institute Member at the Broad Institute, and a board-certified neuropathologist by training.­ I spoke with Suvà about his research and how he uses single-cell RNA sequencing as a discovery tool for understanding brain cancer.

Peggy I. Wang: What brought you to the single-cell world? What originally motivated you to work with single-cell RNA sequencing (scRNA-seq)?

Mario Suvà: Brain cancers, and in particular diffuse gliomas that are the focus of my lab, have a complex underlying biology. They are heterogeneous, and their heterogeneity manifests at many levels—genetic, epigenetic, developmental, and microenvironmental.

When scRNA-seq became possible, it seemed like a perfect technique that could be leveraged to interrogate their complex make-up and better understand the cellular programs that are driving gliomas.

PIW: Could you give a quick rundown on scRNA-seq?

MS: ScRNA-seq allows you to measure the transcriptomes of individual cells in ways similar to what was done traditionally by bulk methods. This added cellular resolution has become possible in the last five or six years and scalable in the last two to three years.

While bulk RNA sequencing has informed cancer biology and helped generate subcategories of cancer, it measures an average signal that results from many cell types aggregated together and thus masks some of the diversity. Single cells are the unit of tissue (or in this case, cancer) and as such, scRNA-seq offers tremendous insights into cellular and cancer biology.

PIW: What helped scRNA-seq become more robust and widely-adopted?

MS: Droplet microfluidics techniques have recently been developed, allowing scRNA-seq to become much more scalable, going from dozens (or at most hundreds) of single cells to now thousands or tens of thousands of single cells. While a few years ago, acquiring scRNA-seq profiles was very labor-intensive, very costly and required deep expertise, now there are very robust commercial solutions that make the approach readily available, even in non-specialized laboratories.

PIW: What about cost?

MS: In droplet-based techniques, you only sequence the ends of transcripts (3’ or 5’), so you actually save quite a bit of money. Those end up costing around 50 cents per cell. However, since you typically profile thousands of cells in a single experiment, it can remain overall quite costly.

In plate-based methods in which you profile full-length transcripts, the cost is about $10 per cell. In those techniques, you would typically profile “only” a few hundred cells.

PIW: Can you explain a bit more about the different types of scRNA-seq? How might they be used differently to study cancer?

MS: The so-called, “Smart-seq2” technique provides full-length transcript information and is one of the very robust techniques around. It’s a very sensitive method and allows you to detect a very high number of unique genes per cell—typically from 5 to 6 thousand unique transcripts.

Genetic information can also be recovered, to some extent, from the full-length information: one can infer copy number alterations, detect fusion transcripts, detect genetic rearrangements (e.g., deletion of exons) and to a certain extent, detect point mutations. There are caveats with both false negatives and false positives for mutation calling on single cells, but there are solutions to improve both aspects.

Smart-seq2 is limited, however, in the number of cells you can profile—typically a few hundred for logistic and cost reasons. So, if you want to capture more diversity of cellular states in a given sample, you would prefer the large-scale droplet microfluidics techniques. Doing so enables the detection of about 1,500 unique genes per cell, but only their 3’ or 5’ ends. So less detailed information per cell, but thousands of cells per sample.

Ideally, you’d leverage a bit of both techniques: a few hundred cells with deep, full-length coverage and then a few thousand cells with more shallow coverage. Then you would have both complexity and diversity well-represented in your datasets.

PIW: That’s interesting you can use scRNA-seq to learn copy number alterations and other genetic information. What are the considerations for using RNA vs. DNA sequencing to get at the same information?

MS: Yes, you can do mutation calling from RNA, if you design your experiment well, but as I mentioned there are a few caveats:

(1) You can do so only if the gene is expressed, whereas in DNA analysis you don’t require the expression to detect it. (2) The sensitivity in detecting a genetic event is correlated to the expression level of the gene, so if the gene is highly-expressed, the mutation might be better retrieved. (3) Since you start from very little material, there are quite a few PCR amplification steps, so you have to ensure that you’re detecting a true mutation and not a PCR artifact, typically by using bulk DNA exome sequencing in parallel. So, while scRNA-seq is not tailored to discover novel mutations, you can still infer some genetic information.

PIW: What aspects of brain tumors can you study with single-cell sequencing that you could not with bulk?

MS: scRNA-seq places the cell back at the center of the cancer biology stage. Cancer is a genetic disease, but the result of a genetic mutation is that a certain cell type or program is evolved. In scRNA-seq, we measure in an integrative way the impact of mutations on the expression state and phenotype of cells.

Cellular context may affect a mutation’s impact. In the field of brain tumors, the thinking is that different cell types, such as stem/progenitor-like as well as more differentiated cells, are recapitulated by malignant cells. We can now define those cellular identities based on scRNA-seq data and integrate this with mutational status into an expression program. We can measure all these aspects of cancer cell biology coherently at cellular resolution directly in clinical samples. That is very powerful.

There is also a huge effort in characterizing immune cells in cancer and the types present in a tumor. If you’re looking at T cells for example, you can identify their activation status and infer their clonality by reconstructing the sequences of T cell receptors (TCRs). So you can begin to infer signaling pathways and TCR sequences that might be relevant for a given cancer type.

PIW: What does this mean for existing bulk sequencing data? How are we still utilizing it?

MS: The numbers of tumors profiled by scRNA-seq remain fairly small compared to existing large-scale cohorts profiled in bulk like TCGA. But you can leverage the single-cell data to interrogate bulk profiles of tumors and add granularity, such as to existing tumor classification schemes. With scRNA-seq signatures, one can tease apart effects of malignant cells, stroma, and the tumor microenvironment in the bulk cohort. You can go back and forth between profiling a few clinical specimens with scRNA-seq and interrogating existing large-scale bulk cohorts and learn from both.

PIW: What are some computational considerations or challenges for scRNA-seq?

MS: This is definitely a big bottleneck for the field. Analyzing the data still requires expertise, infrastructure and logistics. For the field to further take off, I think it will require more people to get into it and acquire the skills. I am not a computational scientist myself but am lucky to collaborate with very talented people in the field as my partners.

Differentiating the cell-to-cell variability that arises from technical artifacts versus what arises from biology is one of the challenges inherent with the technique. You will see technical variation, where some cells have worked well technically and some cells have not. You will also inherently have “drop-outs”, where you only detect a subset of genes expressed in each cell. So it can be difficult to separate a gene that is not detected because it is not expressed in a given cell from a gene that is expressed but was not captured for technical reasons.

Looking at a larger set of genes is one way around this. For example, if you look at one gene involved in cell cycle, you will typically detect it in a subset of cells but not in most cells, and it will be hard to conclude if the cells in which you do not detect the gene might be still cycling. But if you interrogate entire sets of genes involved in cell cycle (typically dozens of genes) to infer the cell cycle activity of any given single cell, that becomes a much more robust analysis. You can do this for any coherent biological phenomenon with enough genes.

PIW: Are you using previously-defined gene sets, or external data sets to inform these gene set expression analyses?

MS: You can leverage external data sets, but deriving signatures from single-cell data itself has the power of generating de novo knowledge. For example, in gliomas, we’ve been able to derive signatures for stem/progenitor cells and differentiation programs towards more mature glial cell types. The genes for those cellular states were determined from our own single-cell data.

The best way is always to combine different approaches; use some of your own data and also leverage external datasets. Additional layers add confidence to your study, and I would say this is true across many fields of research.

As the cancer community generates single-cell data in tumors (e.g., “Tumor Atlas”), the “Human Cell Atlas” community generates single-cell data in normal human tissues. Other scientists are characterizing model organisms by single-cell. As these datasets become available, one can go back and forth between normal, cancer, development, and model systems and use each dataset to learn from each other.

PIW: How might single-cell technologies in general change things for patients?

MS: A large body of knowledge is coming out of single-cell studies in cancer. It’s a necessary first step before translating findings into actionable therapeutics, which is difficult to predict.

Markers may emerge for targeted assays that might be implemented in the clinic—assays that inform that a certain cell type or certain cell state is responding (or not) to a specific intervention. So, while we might not do scRNA-seq in every patient, the knowledge that we’ve derived from scRNA-seq might be used widely.

Knowing which cell types are being eradicated by therapy and which resist and recur in a heterogeneous tumor is very informative. Single-cell analysis is very powered to identify the programs that emerge out of resistance to any type of therapy.

PIW: What are some of the big projects or questions you’re most excited to work on now?

MS: We are trying to build a coherent model that explains how different genetic influences affect different cell types in gliomas and how the proportion of different cell states and their associated genetics make up a patient’s specific glioma subtype. We’re beginning to better integrate glioblastoma stem cells, glioblastoma expression profiling, subtyping, and genetic determinants into our model.

How different therapies affect the cellular compartments we’ve identified and characterized is where we are going next. What compartments are left behind and what needs to be done to eradicate them? As such, we’re trying to do more single-cell analyses in clinical recurrences. We are part of a national effort spearheaded by NCI to analyze matched pairs of primary and recurrent disease. This has become possible because of recent technological developments enabling single-cell analysis of frozen samples.

Overall, being able to dissect all the building blocks of cancer and understanding them in a very detailed fashion is kind of a Holy Grail. If you had told me just a few years ago that we would be measuring single cells in a cancer in such high throughput, I would have doubted that. It’s just amazing to see this revolution happening and to be able to contribute to it.

This post was originally published on January 15, 2019, by the National Cancer Institute. It is republished with permission.