R Tutorial: What is Single Cell RNA-Seq, and why is it useful?

Опубликовано: 21 Март 2020
на канале: DataCamp
34,007
287

Want to learn more? Take the full course at https://learn.datacamp.com/courses/si... at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.

---
My name is Fanny Perraudeau and I'm your instructor for this course on single-cell RNA-Sequencing workflows. In this course, you'll understand what single-cell RNA-sequencing means and why it is useful for many applications in biology and medicine. You'll also learn how to analyze single cell data in R.

And this first chapter is an introduction for the rest of the course.

Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing. It is what we call single-cell RNA-Sequencing or scRNA-Seq. The main advantage of single-cell RNA-Sequencing is that it allows researchers to measure gene expression levels at the resolution of single cells. In that sense, single-cell RNA-Sequencing is analogous to a bowl of fruits, where each piece of fruit is a cell whose type can be identified.

Why is it amazing? It is because the cellular resolution and the genome-wide scope offers the unprecedented opportunity to investigate at the cellular level fundamental biological questions, such as stem cell differentiation or the discovery and characterization of rare cell types. It makes it possible to address issues that are intractable using other methods, e.g. bulk RNA-sequencing.

Using bulk sequencing, which is the technology that was developed before single-cell sequencing, you get an averaged gene expression profile of all the cells in the sample. Continuing the fruit analogy, bulk RNA-Seq could be viewed as a smoothie, where you sample a mixture of blended fruits which gives you an average signal of all cells in the sample. So, using bulk RNA-Sequencing, it is not possible to identify the different cell-types within one sample.

Why is single-cell RNA-Sequencing a revolution in biology? Well, it's because it has plenty of applications, especially in cancer, microbiology, and neurology. For example, in personalized medicine in cancer, it could enable researchers to identify individual clones and biomarkers in a tumor, and select precision drugs for each of them. This is not possible using bulk RNA-sequencing where you get an average gene expression profile of all the cells in the tumor. It's what's represented in the figure here where red, blue, and purple cells have been identified as different cell types and could be targeted separately by different drugs when single-cell RNA-sequencing is used.

With single-cell RNA-Seq, the data you get out from the lab after the preprocessing is this big matrix where you have the genes as the rows and the cells as the columns. Inside the matrix you have counts corresponding to the number of reads aligned to each gene and each cell where a read is a sequence of nucleotides (A,T,C,G). You also get two other matrices corresponding to the cell-level and gene-level covariates. For the gene-level covariates, you for example have the length of the genes or the GC content which is the percentage of nucleotides G and C compared to nucleotides A and T. For the cell-level covariates, you could have quality control measures of the cells, for example, the batches in which the cells have been sequenced.

Now, in the matrix of counts, there are many more zeros than in bulk RNA-sequencing. The zeros can be biological when a gene is simply not expressed in a cell. For example, genes involved in the division of the cell are not expressed at each step of the cell cycle. The zeros can also be technical when the sequencing machine fails to sequence reads from a gene and a cell. In that case, you observe a zero in the count matrix instead of an actual count. It's what people call dropouts.

Now let's start playing with a single-cell RNA-Seq dataset in R!

#R #RTutorial #DataCamp #SingleCell #RNASeq #Bioconductor