1 Preparation

Most of the code in this document was executed at the time of it’s build. For the code and functions to work, the prerequisites required are data and the tools (environment).

1.1 Environment

The environment and tools are setup using a Conda environment. Install miniconda on your system and create a conda environment using the environment.yml file. (Optionally, installing mamba is recommended instead of conda commands for faster resolving time.). This step can take a while. Finally, activate the environment.

conda env create -f environment.yml
conda activate r-4.0-endo

Next steps are run in R. Install R package renv.

options(repos="http://cran.r-project.org")
install.packages("renv")

It’s a good idea to make sure that the library path points to the conda environment.

.libPaths()
"/home/user/miniconda3/envs/r-4.0-endo/lib/R/library"

Then use renv and renv.lock file to install all the R packages needed for analyses.

renv::restore(lockfile="renv.lock")

This step can take a while.

1.2 Directory

The initial directory structure required before commencing analyses is shown below. This document (index.Rmd) is executed in the project directory.

project/
├── index.Rmd
├── data/
│   ├── processed/
│   │   ├── fetal-maternal/
│   │   │   └── emtab-6701.Rds
│   │   ├── labels-cellsubtype.Rds
│   │   ├── labels-celltype.Rds
│   │   └── velocyto/
│   │       ├── 10X_17_107.loom
│   │       ├── 10X_17_109.loom
│   │       └── 10X_18_008.loom
│   └── raw/
│       ├── 10X_17_107/
│       ├── 10X_17_109/
│       └── 10X_18_008/
├── environment.yml
└── renv.lock
  • index.Rmd: This report source file
  • data/: All the data files
  • data/raw: Raw data (CellRanger output/10X files)
  • data/processed/: All intermediates and outputs
  • labels-celltype.Rds, labels-celltype.Rds: Custom cell type and cell subtype labels
  • velocyto: Loom files with spliced, unspliced and ambiguous counts
  • fetal-maternal: External data
  • environment.yml: Conda environment file
  • renv.lock: R package info to restore packages using renv

1.3 Load R packages

set.seed(123)

# load libraries
library(Seurat) # single-cell
library(sctransform) # normalisation
library(MAST) # differential gene expression
library(SoupX) # estimate and correct background rna

# automated cell type identification
library(celldex) # reference data
library(SingleR) # cell type identification
library(scater) # sc framework

library(BiocManager) # managing bioconductor packages

# rna velocity
# remotes::install_github("velocyto-team/velocyto.R")
library(velocyto.R)
# remotes::install_github('satijalab/seurat-wrappers')
library(SeuratWrappers)

# external data integration
library(org.Hs.eg.db) # gene id conversion
library(clusterProfiler) # gene id conversion

# general
library(tidyr) # data wrangling
library(stringr) # data wrangling
library(dplyr) # data wrangling
library(ggplot2) # plotting
library(patchwork) # combining ggplot images
library(pheatmap) # plotting heatmaps
library(readxl) # read excel files
library(writexl) # write excel files

path_wd <- getwd()
path_export <- file.path(path_wd,"data/processed")

2 QC

2.1 Background

SoupX is used to estimate background RNA contamination in each cell/droplet.

library(SoupX)

#' @title run_soupx
#' @description Runs soupx on 10x data data
#' @param path path to 10X data.
#' @return Returns a list of soupx object and corrected counts matrix
#' 
run_soupx <- function(path=NULL) {
  
  message("Running pre-processing & clustering ...")
  x <- CreateSeuratObject(counts=Seurat::Read10X(file.path(path,"outs/filtered_gene_bc_matrices/hg19"),strip.suffix=F),
                     min.cells=0,min.features=0,project="endo") %>%
    NormalizeData() %>%
    ScaleData() %>%
    FindVariableFeatures() %>%
    RunPCA(verbose=FALSE) %>%
    RunTSNE(dims=1:15) %>%
    FindNeighbors() %>%
    FindClusters()
  
  message("Running soupx ...")
  y <- load10X(file.path(path,"outs")) %>%
        setClusters(setNames(Idents(x), colnames(x))) %>%
        setDR(slot(x@reductions[["tsne"]],"cell.embeddings")) %>%
        autoEstCont(doPlot=FALSE)
  
  z <- adjustCounts(y,roundToInt=TRUE)
  
  return(list("soupx"=y,
              "corrected"=z))
  
}
paths <- list.dirs(file.path(path_wd,"data/raw"),recursive=F)
names(paths) <- basename(paths)

for(i in 1:length(paths)) {
  message("Running ",names(paths)[i],"...")
  temp <- run_soupx(paths[i])
  saveRDS(temp,paste0(path_export,"/soupx/soupx-",names(paths)[i],".Rds"))
  DropletUtils:::write10xCounts(file.path(path_export,"soupx",names(paths)[i]),temp$corrected,overwrite=TRUE)
}

Rho estimates

10X_17_107       0.01
10X_17_109_rerun 0.01
10X_18_008_rerun 0.03

SoupX is used to estimate background RNA contamination and not used to correct the counts.

2.2 Read data

The 10x data is read into Seurat.

# read data
paths <- file.path(path_wd,c("data/raw/10X_17_107/outs/filtered_gene_bc_matrices/hg19/",
           "data/raw/10X_17_109_rerun/outs/filtered_gene_bc_matrices/hg19/",
           "data/raw/10X_18_008_rerun/outs/filtered_gene_bc_matrices/hg19/"))
vec <- c("s1"=paths[1],"s2"=paths[2],"s3"=paths[3])
eu <- CreateSeuratObject(counts=Read10X(data.dir=vec),min.cells=10,min.features=200,project="endo")
eu
## An object of class Seurat 
## 18383 features across 6864 samples within 1 assay 
## Active assay: RNA (18383 features, 0 variable features)
p1 <- eu[[]] %>%
 tibble::rownames_to_column("cell") %>%
  dplyr::select(orig.ident,nCount_RNA,cell) %>%
  group_by(orig.ident) %>%
  arrange(-nCount_RNA) %>%
  mutate(rank=1:n()) %>%
  ggplot(aes(x=rank,log10(nCount_RNA),group=orig.ident,colour=orig.ident))+
  geom_line()+
  labs(x="Ranked Cells",y="Log10 Reads per cell")+
  theme(legend.position="top")

p2 <- eu[[]] %>%
  tibble::rownames_to_column("cell") %>%
  dplyr::select(orig.ident,nCount_RNA,cell) %>%
  group_by(orig.ident) %>%
  arrange(-nCount_RNA) %>%
  mutate(rank=1:n()) %>%
  mutate(rank=scales::rescale(rank,to=c(0,1))) %>%
  mutate(reads=scales::rescale(log10(nCount_RNA),to=c(0,1))) %>%
  ggplot(aes(x=rank,reads,col=orig.ident))+
  geom_line()+
  labs(x="Ranked Cells Scaled",y="Reads per cell scaled")+
  theme(legend.position="top")

wrap_plots(p1,p2,guides="collect")&
  theme(legend.position="top")

Fig. 1: UMI curve for the three samples. Left shows absolute and Right shows scaled values.

2.3 Mitochondrial

Add percentage of mitochondrial counts.

eu$percent_mito <- PercentageFeatureSet(eu,pattern="^MT-")
eu[[]] %>%
  select(nCount_RNA,nFeature_RNA,percent_mito,orig.ident) %>%
  pivot_longer(!orig.ident,names_to="metric",values_to="value") %>%
  ggplot(.,aes(value,fill=orig.ident))+
  geom_histogram()+
  facet_wrap(~metric,scales="free")

Fig. 2: Histograms of gene counts, number of genes detected and percentage of mitochondrial counts in the non-integrated RNA assay.

p1 <- FeatureScatter(eu, feature1="nCount_RNA", feature2="percent_mito")
p2 <- FeatureScatter(eu, feature1="nFeature_RNA", feature2="percent_mito")
p3 <- FeatureScatter(eu, feature1="nCount_RNA", feature2="nFeature_RNA")
wrap_plots(p1,p2,p3,ncol=2)

Fig. 3: ( Top Left) Number of reads per cell vs percentage of mitochondrial reads per cell. (Top Right) Number of genes per cell vs percentage of mitochondrial reads per cell. (Bottom Left) Number of reads per cell vs number of genes detected per cell.

2.4 Cell cycle

# Cell cycle phases using seurat
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
eu <- CellCycleScoring(eu,s.features=s.genes,g2m.features=g2m.genes,set.ident=TRUE)
eu$cc.diff <- eu$S.Score-eu$G2M.Score
as.data.frame(table(eu$Phase))
table(eu$orig.ident,eu$Phase)
##     
##        G1  G2M    S
##   s1 1796   59  467
##   s2 1243   46  458
##   s3 2237   45  513
eu[[]] %>%
  ggplot(aes(x=S.Score,y=G2M.Score,color=Phase))+
  geom_point(size=0.8,alpha=0.6)+
  labs(x="s",y="g2m",title="Seurat")

Fig. 4: Scatterplot showing S.Scores and G2M.Scores as well as assigned phases.

saveRDS(eu,file.path(path_export,"seurat-1-raw.Rds"))

2.5 Filtering

Remove ERCC, RPL and RPS genes. And remove cells that are doublets, have less than 200 features, more than 5000 features and percentage mitochondrial reads greater than 10%.

eu <- subset(eu,features = grep(pattern="^ERCC|^RPL|^RPS",x=rownames(as.matrix(GetAssayData(eu))),value=TRUE,invert=T))
eu <- subset(eu, subset=nFeature_RNA > 200 & nFeature_RNA < 5000 & percent_mito < 10)
eu <- subset(eu,cells=names(readRDS(file.path(path_export,"labels-celltype.Rds"))))
eu
## An object of class Seurat 
## 18276 features across 6348 samples within 1 assay 
## Active assay: RNA (18276 features, 0 variable features)

3 Data integration

The three samples are integrated to remove the effect of individuals.

library(sctransform)
eu.list <- SplitObject(eu,split.by="orig.ident")

for (i in 1:length(eu.list)) {
    eu.list[[i]] <- SCTransform(eu.list[[i]],return.only.var.genes=F,
                                vars.to.regress=c("percent_mito","cc.diff"))
}

eu.features <- SelectIntegrationFeatures(eu.list,nfeatures=3000)
eu.list <- Seurat::PrepSCTIntegration(eu.list,anchor.features=eu.features)
eu.anchors <- FindIntegrationAnchors(object.list=eu.list,normalization.method="SCT",anchor.features=eu.features)
e <- IntegrateData(anchorset=eu.anchors,normalization.method="SCT")
e$nFeature_INT <- Matrix::colSums(GetAssayData(e,assay="integrated")>0)
e$nCount_INT <- Matrix::colSums(GetAssayData(e,assay="integrated"))
DefaultAssay(e) <- "integrated"
Idents(e) <- e$orig.ident
saveRDS(e,file.path(path_export,"seurat-2-integrated.Rds"))
rm(eu,eu.anchors,eu.features,eu.list)
e
## An object of class Seurat 
## 38933 features across 6348 samples within 3 assays 
## Active assay: integrated (3000 features, 3000 variable features)
##  2 other assays present: RNA, SCT

4 Dimensionality reduction

Dimensionality reduction methods including PCA, tSNE and UMAP.

e <- RunPCA(e,assay="integrated",verbose=F,seed.use=42)
PCAPlot(e)

Fig. 5: PCA scatterplot coloured by sample.

ElbowPlot(e,ndims=50)

Fig. 6: Elbow plot to estimate number of PCs to retain in downstream analyses.

e <- RunTSNE(e,dims=1:18,seed.use=100)
e <- RunUMAP(e,dims=1:18,seed.use=100)

p1 <- TSNEPlot(e,group.by="orig.ident")
p2 <- UMAPPlot(e,group.by="orig.ident")
(p1|p2)

Fig. 7: Left: tSNE scatterplot showing the identity of the three samples. Right: UMAP scatterplot showing the identity of the three samples.

p1 <- FeaturePlot(e,red="tsne",features="nFeature_RNA")
p2 <- FeaturePlot(e,red="tsne",features="percent_mito")
p3 <- FeaturePlot(e,red="tsne",features="S.Score")
p4 <- FeaturePlot(e,red="tsne",features="G2M.Score")
(p1|p2)/(p3|p4)

Fig. 8: tSNE plot showing; Top left: number of genes detected per cell, Top right: percentage mitochondrial content in cells, Bottom Left: S phase cells, Bottom Right: G2/M phase cells.

5 Cell type

5.1 Clustering

A network graph is built using nearest neighbors. Louvain clustering is used to find communities/clusters on this graph.

e <- FindNeighbors(e,dims=1:20)
e <- FindClusters(e,resolution=0.2,random.seed=0)
TSNEPlot(e)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 6348
## Number of edges: 221438
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8859
## Number of communities: 10
## Elapsed time: 0 seconds

Fig. 9: tSNE scatterplot with clusters identified by Seurat (Louvain clustering).

5.2 Manual

Manual cell type identification using known marker genes.

VlnPlot(e,features=c("VWF","EPCAM","PTPRC","CXCL12","CD14","CD27",
                     "RGS5","TYMS","TOP2A","GNLY"),pt.size=0)

Fig. 10: Violin plot showing expression of known marker genes in various clusters.

FeaturePlot(e,features=c("VWF","EPCAM","PTPRC","CXCL12","CD14","CD27"),
            reduction="tsne")&
  theme(legend.position="top",
        legend.direction="horizontal",
        legend.justification=0.5)

Fig. 11: tSNE scatterplot showing expression of known marker genes in various clusters. The colour denotes the level of expression with bright blue indicating higher expression in that cell.

5.3 SingleR

Automated cell type identification using SingleR. SingleR was run using two different reference datasets: HPCA (Human Primary Cell Atlas) and BPE (Blue Print ENCODE).

set.seed(123)
library(celldex)
hpca <- celldex::HumanPrimaryCellAtlasData()
bpe <- celldex::BlueprintEncodeData()
  
library(scater)
library(SingleR)

pred_hpca <- SingleR(test=GetAssayData(e,slot="data",assay="integrated"),ref=hpca,labels=hpca$label.main)
table(pred_hpca$labels)
e$hpca <- pred_hpca$labels

pred_bpe <- SingleR(test=GetAssayData(e,slot="data",assay="integrated"),ref=bpe,labels=bpe$label.main)
table(pred_bpe$labels)
e$bpe <- pred_bpe$labels

rm(hpca,bpe,pred_hpca,pred_bpe)
## 
##            Astrocyte               B_cell                   BM 
##                   88                   77                  223 
##           BM & Prog.         Chondrocytes                   DC 
##                   21                  113                   64 
## Embryonic_stem_cells    Endothelial_cells     Epithelial_cells 
##                   71                  128                  559 
##         Erythroblast          Fibroblasts          Gametocytes 
##                  219                   69                 1315 
##                  GMP          Hepatocytes           HSC_-G-CSF 
##                    4                  232                  218 
##            iPS_cells        Keratinocytes           Macrophage 
##                   30                  380                  226 
##                  MEP             Monocyte                  MSC 
##                    8                  155                  276 
##            Myelocyte Neuroepithelial_cell              Neurons 
##                  332                  234                  393 
##          Neutrophils              NK_cell          Osteoblasts 
##                  290                   54                   47 
##            Platelets     Pre-B_cell_CD34-     Pro-B_cell_CD34+ 
##                  141                   20                   25 
##        Pro-Myelocyte  Smooth_muscle_cells              T_cells 
##                    5                   44                  211 
##    Tissue_stem_cells 
##                   76 
## 
##        Adipocytes        Astrocytes           B-cells      CD4+ T-cells 
##                18               118               117               227 
##      CD8+ T-cells      Chondrocytes                DC Endothelial cells 
##               403                66                 4               194 
##       Eosinophils  Epithelial cells      Erythrocytes       Fibroblasts 
##                97               293               397               120 
##               HSC     Keratinocytes       Macrophages       Melanocytes 
##                35               201               613                36 
##   Mesangial cells         Monocytes          Myocytes           Neurons 
##                32               201               147              1493 
##       Neutrophils          NK cells         Pericytes   Skeletal muscle 
##               698               280               175               268 
##     Smooth muscle 
##               115
TSNEPlot(e,group.by="hpca")+
  guides(colour=guide_legend(nrow=12,override.aes=list(size=2)))+
  theme(legend.position="top")

Fig. 12: tSNE scatterplot showing cell types predicted by SingleR using HPCA (Human Primary Cell Atlas) reference.

TSNEPlot(e,group.by="bpe")+
  guides(colour=guide_legend(nrow=9,override.aes=list(size=2)))+
  theme(legend.position="top")

Fig. 13: tSNE scatterplot showing cell types predicted by SingleR using BPE (Blue Print ENCODE) reference.

5.4 Cluster labels

The final high level cluster labels are designated based on the previous information. In our case, we will read in predefined labels.

e$cell_type <- readRDS(file.path(path_export,"labels-celltype.Rds"))
e$cell_subtype <- readRDS(file.path(path_export,"labels-cellsubtype.Rds"))

Idents(e) <- e$cell_type
TSNEPlot(e)

Fig. 14: tSNE and UMAP scatterplots showing the high level designated labels on clusters.

5.5 Marker discovery

Markers are discovered for the defined clusters/cell types through differential gene expression (DE). DE comparisons are performed between each cluster vs all other cells.

m <- FindAllMarkers(e,test.use="MAST",only.pos=TRUE,min.pct=0.25,
                 logfc.threshold=1,max.cells.per.ident=500,
                 min.cells.group=10,random.seed=100,return.thresh=0.05,
                 assay="SCT",slot="data")

m1 <- m %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=5,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))
# dotplot
DotPlot(e,features=m1$gene)+
  coord_flip()+
  labs(x="",y="")+
  theme_bw(base_size=7)+
  theme(axis.text.x=element_text(angle=45,hjust=1))

Fig. 15: Dotplot showing top 5 (ranked by avg log2 fold change) marker genes discovered through DE. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

DoHeatmap(subset(e,downsample=200),features=m1$gene,)

Fig. 16: Top 5 (as in previous figure) marker genes discovered through DE is now shown as a heatmap. Rows denote genes and columns denote cells. Cells have been downsampled to maximum 200 cells per cluster to show a balanced view of all clusters.

Overview of number of cells and categories.

table(e$orig.ident)
table(e$cell_type)
table(e$cell_subtype)
table(e$orig.ident,e$cell_type)
table(e$orig.ident,e$cell_subtype)
## 
##   s1   s2   s3 
## 2159 1610 2579 
## 
##         Stromal        Pericyte         Immune1      Epithelial         Immune2 
##            5663             214             182              90              71 
## Cycling Stromal     Endothelial 
##              68              60 
## 
##              ACTA2+               BMP7+             CTNNB1+     Cycling Stromal 
##                 673                 329                 241                  82 
##          ECM Stroma         Endothelial          Epithelial             Immune1 
##                 154                  60                  91                 179 
##             Immune2              ISG15+              PAGE4+           Pericyte1 
##                  75                 131                 238                 142 
##           Pericyte2 Stroma_Unclassified               THY1+ 
##                  73                3690                 190 
##     
##      Stromal Pericyte Immune1 Epithelial Immune2 Cycling Stromal Endothelial
##   s1    1935       65      67         29      18              30          15
##   s2    1366       93      60         25      26              23          17
##   s3    2362       56      55         36      27              15          28
##     
##      ACTA2+ BMP7+ CTNNB1+ Cycling Stromal ECM Stroma Endothelial Epithelial
##   s1    257   147      62              37         56          15         29
##   s2    161    37      64              29         35          17         25
##   s3    255   145     115              16         63          28         37
##     
##      Immune1 Immune2 ISG15+ PAGE4+ Pericyte1 Pericyte2 Stroma_Unclassified THY1+
##   s1      65      22     46     81        44        21                1226    51
##   s2      60      24     34     57        69        24                 916    58
##   s3      54      29     51    100        29        28                1548    81

6 Subset

Previously defined clusters were subsetted and reanalyzed using similar workflow for higher resolution.

6.1 Stromal+Pericyte

estpu <- subset(e,cells=names(e$cell_type[e$cell_type=="Stromal"|e$cell_type=="Pericyte"]))
DefaultAssay(estpu) <- "RNA"
estpu <- DietSeurat(estpu,assays="RNA",counts=TRUE,scale.data=FALSE)
estpu$percent_mito <- PercentageFeatureSet(estpu, pattern="^MT-")

estpu.list <- SplitObject(estpu,split.by="orig.ident")
for (i in 1:length(estpu.list)) {
    estpu.list[[i]] <- SCTransform(estpu.list[[i]],return.only.var.genes=F,vars.to.regress=c("percent_mito","cc.diff"))
}
estpu.features <- SelectIntegrationFeatures(estpu.list,nfeatures=3000)
estpu.list <- Seurat::PrepSCTIntegration(estpu.list,anchor.features=estpu.features)
estpu.anchors <- FindIntegrationAnchors(object.list=estpu.list,normalization.method="SCT",anchor.features=estpu.features)
estp <- IntegrateData(anchorset=estpu.anchors,normalization.method="SCT")

estp <- RunPCA(estp,verbose=F,seed.use=100)
estp <- RunTSNE(estp,dims=1:28,seed.use=100)
estp <- RunUMAP(estp,dims=1:28,seed.use=100)
estp <- FindNeighbors(estp,dims=1:28)
estp <- FindClusters(estp,resolution=0.8,random.seed=0)
rm(estpu,estpu.anchors,estpu.features,estpu.list)
estp
## An object of class Seurat 
## 38286 features across 5877 samples within 3 assays 
## Active assay: integrated (3000 features, 3000 variable features)
##  2 other assays present: RNA, SCT
##  3 dimensional reductions calculated: pca, tsne, umap
p1 <- TSNEPlot(estp)
p2 <- TSNEPlot(estp,group.by="orig.ident")
p3 <- FeaturePlot(estp,red="tsne",features="percent_mito")
p4 <- FeaturePlot(estp,red="tsne",features="S.Score")
(p1|p2)/(p3|p4)

Fig. 17: Top left: tSNE plot showing clusters identified by Seurat. Top right: TSNE plot showing the original samples. Bottom left: tSNE plot showing percentage mitochondrial content in cells. Bottom right: tSNE plot showing dividing cells.

Pre-defined labels are used to define clusters.

Idents(estp) <- estp$cell_subtype
TSNEPlot(estp)

Fig. 18: tSNE plot showing custom clusters.

6.1.1 Markers

Marker genes are identified between all clusters by DE testing.

estp_markers_mast <- FindAllMarkers(estp,test.use="MAST",only.pos=TRUE,min.pct=0.25,
                 logfc.threshold=0.25,max.cells.per.ident=500,
                 min.cells.group=10,random.seed=100,return.thresh=0.05,
                 assay="SCT",slot="data")

estp_markers_mast1 <- estp_markers_mast %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=5,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))
DotPlot(estp,features=unique(estp_markers_mast1$gene))+
  coord_flip()+
  labs(x="",y="")+
  ggtitle("MAST")+
  theme_bw(base_size=7)+
  theme(axis.text.x = element_text(angle=45,hjust=1))

Fig. 19: Dotplot showing marker genes within stromal sub clusters. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

estp_markers_mast2 <- estp_markers_mast %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=1,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))

FeaturePlot(estp,features=unique(estp_markers_mast2$gene),reduction="tsne")&
  theme(legend.position="top",
        legend.direction="horizontal",
        legend.justification=0.5)

Fig. 20: Dotplot showing marker genes within stromal sub clusters. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

DoHeatmap(subset(estp,downsample=200),features=unique(estp_markers_mast1$gene))

Fig. 21: Top 5 (as in previous figure) marker genes discovered through DE is now shown as a heatmap. Rows denote genes and columns denote cells. Cells have been downsampled to maximum 200 cells per cluster to show a balanced view of all clusters.

table(estp$orig.ident)
table(estp$cell_type)
table(estp$cell_subtype)
table(estp$orig.ident,estp$cell_type)
table(estp$orig.ident,estp$cell_subtype)
## 
##   s1   s2   s3 
## 2000 1459 2418 
## 
## Pericyte  Stromal 
##      214     5663 
## 
##              ACTA2+               BMP7+             CTNNB1+     Cycling Stromal 
##                 672                 329                 241                  16 
##          ECM Stroma          Epithelial             Immune2              ISG15+ 
##                 154                   1                   3                 131 
##              PAGE4+           Pericyte1           Pericyte2 Stroma_Unclassified 
##                 238                 142                  73                3687 
##               THY1+ 
##                 190 
##     
##      Pericyte Stromal
##   s1       65    1935
##   s2       93    1366
##   s3       56    2362
##     
##      ACTA2+ BMP7+ CTNNB1+ Cycling Stromal ECM Stroma Epithelial Immune2 ISG15+
##   s1    257   147      62               8         56          0       2     46
##   s2    161    37      64               6         35          0       0     34
##   s3    254   145     115               2         63          1       1     51
##     
##      PAGE4+ Pericyte1 Pericyte2 Stroma_Unclassified THY1+
##   s1     81        44        21                1225    51
##   s2     57        69        24                 914    58
##   s3    100        29        28                1548    81

6.2 Stromal

estu <- subset(e,cells=names(e$cell_type[e$cell_type=="Stromal"]))
DefaultAssay(estu) <- "RNA"
estu <- DietSeurat(estu,assays="RNA",counts=TRUE,scale.data=FALSE)
estu$percent_mito <- PercentageFeatureSet(estu, pattern="^MT-")

estu.list <- SplitObject(estu,split.by="orig.ident")
for (i in 1:length(estu.list)) {
    estu.list[[i]] <- SCTransform(estu.list[[i]],return.only.var.genes=F,vars.to.regress=c("percent_mito","cc.diff"))
}
estu.features <- SelectIntegrationFeatures(estu.list,nfeatures=3000)
estu.list <- Seurat::PrepSCTIntegration(estu.list,anchor.features=estu.features)
estu.anchors <- FindIntegrationAnchors(object.list=estu.list,normalization.method="SCT",anchor.features=estu.features)
est <- IntegrateData(anchorset=estu.anchors,normalization.method="SCT")

est <- RunPCA(est,verbose=F,seed.use=100)
est <- RunTSNE(est,dims=1:28,seed.use=100)
est <- RunUMAP(est,dims=1:28,seed.use=100)
est <- FindNeighbors(est,dims=1:28)
est <- FindClusters(est,resolution=0.8,random.seed=0)
rm(estu,estu.anchors,estu.features,estu.list)
est
## An object of class Seurat 
## 38153 features across 5663 samples within 3 assays 
## Active assay: integrated (3000 features, 3000 variable features)
##  2 other assays present: RNA, SCT
##  3 dimensional reductions calculated: pca, tsne, umap
p1 <- TSNEPlot(est)
p2 <- TSNEPlot(est,group.by="orig.ident")
p3 <- FeaturePlot(est,red="tsne",features="percent_mito")
p4 <- FeaturePlot(est,red="tsne",features="S.Score")
(p1|p2)/(p3|p4)

Fig. 22: Top left: tSNE plot showing clusters identified by Seurat. Top right: TSNE plot showing the original samples. Bottom left: tSNE plot showing percentage mitochondrial content in cells. Bottom right: tSNE plot showing dividing cells.

Pre-defined labels are used to define clusters.

Idents(est) <- est$cell_subtype
TSNEPlot(est)

Fig. 23: tSNE plot showing custom clusters.

6.2.1 Markers

Marker genes are identified between all clusters by DE testing.

est_markers_mast <- FindAllMarkers(est,test.use="MAST",only.pos=TRUE,min.pct=0.25,
                 logfc.threshold=0.25,max.cells.per.ident=500,
                 min.cells.group=10,random.seed=100,return.thresh=0.05,
                 assay="SCT",slot="data")

est_markers_mast1 <- est_markers_mast %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=6,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))
DotPlot(est,features=unique(est_markers_mast1$gene))+
  coord_flip()+
  labs(x="",y="")+
  ggtitle("MAST")+
  theme_bw(base_size=7)+
  theme(axis.text.x = element_text(angle=45,hjust=1))

Fig. 24: Dotplot showing marker genes within stromal sub clusters. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

est_markers_mast2 <- est_markers_mast %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=1,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))

FeaturePlot(est,features=unique(est_markers_mast2$gene),reduction="tsne")&
  theme(legend.position="top",
        legend.direction="horizontal",
        legend.justification=0.5)

Fig. 25: Dotplot showing marker genes within stromal sub clusters. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

DoHeatmap(subset(est,downsample=200),features=unique(est_markers_mast1$gene))

Fig. 26: Top 5 (as in previous figure) marker genes discovered through DE is now shown as a heatmap. Rows denote genes and columns denote cells. Cells have been downsampled to maximum 200 cells per cluster to show a balanced view of all clusters.

table(est$orig.ident)
table(est$cell_type)
table(est$cell_subtype)
table(est$orig.ident,est$cell_type)
table(est$orig.ident,est$cell_subtype)
## 
##   s1   s2   s3 
## 1935 1366 2362 
## 
## Stromal 
##    5663 
## 
##              ACTA2+               BMP7+             CTNNB1+     Cycling Stromal 
##                 672                 329                 241                  15 
##          ECM Stroma          Epithelial             Immune2              ISG15+ 
##                 154                   1                   3                 131 
##              PAGE4+           Pericyte2 Stroma_Unclassified               THY1+ 
##                 238                   2                3687                 190 
##     
##      Stromal
##   s1    1935
##   s2    1366
##   s3    2362
##     
##      ACTA2+ BMP7+ CTNNB1+ Cycling Stromal ECM Stroma Epithelial Immune2 ISG15+
##   s1    257   147      62               8         56          0       2     46
##   s2    161    37      64               5         35          0       0     34
##   s3    254   145     115               2         63          1       1     51
##     
##      PAGE4+ Pericyte2 Stroma_Unclassified THY1+
##   s1     81         0                1225    51
##   s2     57         1                 914    58
##   s3    100         1                1548    81

6.3 Pericyte

Data is not integrated due to insufficient number of cells in this dataset. Individual effect is instead corrected through vars.to.regress.

# reanalyse pericytes
ep <- subset(e,cells=names(e$cell_type[e$cell_type=="Pericyte"]))
DefaultAssay(ep) <- "RNA"
ep <- DietSeurat(ep,assay="RNA")
ep$percent_mito <- PercentageFeatureSet(ep, pattern="^MT-")
ep <- NormalizeData(ep)
ep <- ScaleData(ep,vars.to.regress=c("orig.ident","percent_mito","cc.diff"))
ep <- FindVariableFeatures(ep,selection.method="vst",nfeatures=5000)
ep <- RunPCA(ep,verbose=F,seed.use=100)
ep <- RunTSNE(ep,dims=1:13,seed.use=100)
ep <- RunUMAP(ep,dims=1:13,seed.use=100)
ep <- FindNeighbors(ep,dims=1:11)
ep <- FindClusters(ep,resolution=1.4,random.seed=0)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 214
## Number of edges: 7811
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.3309
## Number of communities: 8
## Elapsed time: 0 seconds
p1 <- TSNEPlot(ep)
p2 <- TSNEPlot(ep,group.by="orig.ident")
p3 <- FeaturePlot(ep,red="tsne",features="percent_mito")
p4 <- FeaturePlot(ep,red="tsne",features="S.Score")
(p1|p2)/(p3|p4)

Fig. 27: Top left: tSNE plot showing clusters identified by Seurat. Top right: TSNE plot showing the original samples. Bottom left: tSNE plot showing percentage mitochondrial content in cells. Bottom right: tSNE plot showing dividing cells.

Pre-defined labels are used to define clusters.

Idents(ep) <- ep$cell_subtype
TSNEPlot(ep)

Fig. 28: tSNE plot showing custom clusters.

6.3.1 Markers

Marker genes are identified between all clusters by DE testing.

ep_markers_mast <- FindAllMarkers(ep,test.use="MAST",only.pos=TRUE,
                                  min.pct=0.25,logfc.threshold=0.25,
                 min.cells.group=10,random.seed=100,return.thresh=0.05,
                 assay="RNA",slot="data",latent.vars=c("orig.ident","percent_mito","cc.diff"))

ep_markers_mast1 <- ep_markers_mast %>% 
  filter(p_val_adj<0.05) %>%
  group_by(cluster) %>% 
  top_n(n=5,wt=abs(avg_logFC)) %>%
  arrange(cluster,-abs(avg_logFC))
DotPlot(ep,features=unique(ep_markers_mast1$gene))+
  coord_flip()+
  labs(x="",y="")+
  theme_bw(base_size=7)+
  theme(axis.text.x = element_text(angle=45,hjust=1))

Fig. 29: Dotplot showing marker genes within stromal sub clusters. Colour denotes mean expression level. Stronger colour denotes higher expression. The size of the circles denote precentage of cells within that cluster that show expression.

DoHeatmap(ep,features=ep_markers_mast1$gene)

Fig. 30: Top 5 (as in previous figure) marker genes discovered through DE is now shown as a heatmap. Rows denote genes and columns denote cells.

table(ep$orig.ident)
table(ep$cell_type)
table(ep$cell_subtype)
table(ep$orig.ident,ep$cell_type)
table(ep$orig.ident,ep$cell_subtype)
## 
## s1 s2 s3 
## 65 93 56 
## 
##         Stromal        Pericyte         Immune1      Epithelial         Immune2 
##               0             214               0               0               0 
## Cycling Stromal     Endothelial 
##               0               0 
## 
## Cycling Stromal       Pericyte1       Pericyte2 
##               1             142              71 
##     
##      Stromal Pericyte Immune1 Epithelial Immune2 Cycling Stromal Endothelial
##   s1       0       65       0          0       0               0           0
##   s2       0       93       0          0       0               0           0
##   s3       0       56       0          0       0               0           0
##     
##      Cycling Stromal Pericyte1 Pericyte2
##   s1               0        44        21
##   s2               1        69        23
##   s3               0        29        27

7 Velocyto

RNA velocity was estimate using the tool Velocyto. Velocyto was run through a docker container retrieved from dockerhub repository (genomicpariscentre/velocyto). The 10X directories (CellRanger output) including BAM files and hg19 annotations were provided as input. The output was loom files with three assays: spliced, unspliced and ambiguous counts.

# code not evaluated

docker pull genomicpariscentre/velocyto
docker run -v /project/data/raw:/home/ --name 10X_17_107 genomicpariscentre/velocyto velocyto run10x /home/10X_17_107/ /home/hg19-1.2.0-genes.gtf
docker run -v /project/data/raw:/home/ --name 10X_17_109 genomicpariscentre/velocyto velocyto run10x /home/10X_17_109/ /home/hg19-1.2.0-genes.gtf
docker run -v /project/data/raw:/home/ --name 10X_18_008 genomicpariscentre/velocyto velocyto run10x /home/10X_18_008/ /home/hg19-1.2.0-genes.gtf

Read loom files with spliced, unspliced and ambiguous counts. Labels are corrected and converted to Seurat objects.

set.seed(123)
library(velocyto.R)
library(SeuratWrappers)

s1 <- read.loom.matrices("./data/processed/velocyto/10X_17_107.loom")
s2 <- read.loom.matrices("./data/processed/velocyto/10X_17_109.loom")
s3 <- read.loom.matrices("./data/processed/velocyto/10X_18_008.loom")

colnames(s1$spliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s1$spliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s2$spliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s2$spliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s3$spliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s3$spliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")

colnames(s1$unspliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s1$unspliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s2$unspliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s2$unspliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s3$unspliced) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s3$unspliced),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")

colnames(s1$ambiguous) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s1$ambiguous),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s2$ambiguous) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s2$ambiguous),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")
colnames(s3$ambiguous) <- paste0(stringr::str_replace(stringr::str_replace(stringr::str_replace(stringr::str_replace(colnames(s3$ambiguous),"^10X_17_107:","s1_"),"^10X_17_109_rerun:","s2_"),"^10X_18_008_rerun:","s3_"),"x$",""),"-1")

s1 <- as.Seurat(s1)
s2 <- as.Seurat(s2)
s3 <- as.Seurat(s3)
## reading loom file via hdf5r...
## reading loom file via hdf5r...
## reading loom file via hdf5r...

Runs are merged into a single object. Default assay is set to spliced. Cell types of interest are subsetted to create a new object. Standard Seurat processing is performed followed by dimensionality reduction and custom cell labels are added.

sx <- merge(s1,y=c(s2,s3),project="endo")
DefaultAssay(sx) <- "spliced"

s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
sx <- CellCycleScoring(sx,s.features=s.genes,g2m.features=g2m.genes,set.ident=F)
sx$cc.diff <- sx$S.Score-sx$G2M.Score
sx$percent_mito <- PercentageFeatureSet(sx, pattern="^MT-")

# subset data
vsp <- subset(sx,cells=names(e$cell_type[e$cell_subtype=="ACTA2+" | e$cell_subtype=="THY1+" | e$cell_subtype=="Pericyte1" | e$cell_subtype=="Pericyte2"]))
vsp <- subset(vsp,subset=nFeature_spliced>200 & nFeature_spliced<5000 & percent_mito<10)
vsp <- subset(vsp,features=grep(pattern="^ERCC|^RPL|^RPS",x=rownames(as.matrix(GetAssayData(vsp))),value=TRUE,invert=T))

vsp <- NormalizeData(vsp)
vsp <- ScaleData(vsp,vars.to.regress=c("orig.ident","percent_mito","cc.diff"))
vsp <- ScaleData(vsp,vars.to.regress=c("orig.ident","percent_mito","cc.diff"),assay = "unspliced")
vsp <- FindVariableFeatures(vsp,selection.method="vst")
vsp <- FindVariableFeatures(vsp,selection.method="vst",assay ="unspliced")
vsp <- RunPCA(vsp,seed.use=42,verbose=FALSE)
vsp <- RunPCA(vsp,seed.use=42,verbose=FALSE,assay="unspliced")
vsp <- RunTSNE(vsp,dims=1:12,seed.use=100)
vsp <- RunTSNE(vsp,dims=1:15,seed.use=100,assay="unspliced")
vsp <- RunUMAP(vsp,dims=1:12,seed.use=100)
vsp <- RunUMAP(vsp,dims=1:15,seed.use=100,assay="unspliced")

vsp$cell_type <- e$cell_type
vsp$cell_subtype <- e$cell_subtype
Idents(vsp) <- factor(as.character(vsp$cell_subtype),levels=c("ACTA2+","THY1+","Pericyte1","Pericyte2"))
rm(s1,s2,s3,sx,s.genes,g2m.genes)

RNA velocity is run through Seurat wrapper function and velocity estimates are visualized as vector fields.

vsp <- RunVelocity(vsp,deltaT=1,kCells=25,fit.quantile=0.02)
#saveRDS(vsp,file.path(path_export,"velocyto-stromapericyte.Rds"))
ident.colors <- c('#95CC5EFF',"#5C88DAFF",'#017A4AFF','#FFCE4EFF')
names(ident.colors) <- levels(vsp)
cell.colors <- ident.colors[Idents(vsp)]
names(cell.colors) <- colnames(vsp)

n <- 1:length(colnames(vsp))
#n <- sample(1:length(colnames(vsp)),200,replace=F)

png(file.path(path_export,"velocyto-stromapericyte-umap.png"),height=16,width=15,units="cm",res=300)
show.velocity.on.embedding.cor(emb=Embeddings(object=vsp, reduction="umap")[n,]/2, vel=Tool(object=subset(vsp,cells=colnames(vsp)[n]), slot="RunVelocity"), n=250, scale="sqrt", cell.colors=ac(x=cell.colors, alpha=0.5), cex=0.8, arrow.scale=2, show.grid.flow=TRUE, min.grid.cell.mass=0.5, grid.n=40, arrow.lwd=0.8,do.par=FALSE, cell.border.alpha=0.1,n.cores=8)
#text(x=c(1.5,3.2,1.7,3.5,-1.2,0.15,0),y=c(-1,-0.3,1.1,1.8,2.0,2.4,3.1),labels=c("Stroma","Immune2","Epithelial","Immune1","Pericyte","Cycling Stroma","Endothelial"),cex=0.9)
legend("topright",legend=levels(vsp),pch=20,col=ident.colors,bty="n",pt.cex=1,cex=0.6)
title(main="Velocyto: RNA velocity on UMAP projection")
dev.off()

png(file.path(path_export,"velocyto-stromapericyte-tsne.png"),height=16,width=15,units="cm",res=300)
show.velocity.on.embedding.cor(emb=Embeddings(object=vsp, reduction="tsne")[n,]/12, vel=Tool(object=subset(vsp,cells=colnames(vsp)[n]), slot="RunVelocity"), n=250, scale="sqrt", cell.colors=ac(x=cell.colors, alpha=0.5), cex=0.8, arrow.scale=2.5, show.grid.flow=TRUE, min.grid.cell.mass=0.5, grid.n=40, arrow.lwd=0.8,do.par=FALSE, cell.border.alpha=0.1,n.cores=8)
legend("bottomright",legend=levels(vsp),pch=20,col=ident.colors,bty="n",pt.cex=1,cex=0.6)
title(main="Velocyto: RNA velocity on tSNE projection")
dev.off()
## delta projections ... sqrt knn ... transition probs ... done
## calculating arrows ... done
## grid estimates ... grid.sd= 0.082234  min.arrow.size= 0.00164468  max.grid.arrow.length= 0.04031543  done
## png 
##   2 
## delta projections ... sqrt knn ... transition probs ... done
## calculating arrows ... done
## grid estimates ... grid.sd= 0.07223259  min.arrow.size= 0.001444652  max.grid.arrow.length= 0.04031543  done
## png 
##   2
knitr::include_graphics(file.path(path_export,"velocyto-stromapericyte-tsne.png"))

Fig. 31: RNA velocity vector field visualised over a tSNE projection. Dataset shown is subsetted stroma and pericytes.

knitr::include_graphics(file.path(path_export,"velocyto-stromapericyte-umap.png"))

Fig. 31: RNA velocity vector field visualised over a tSNE projection. Dataset shown is subsetted stroma and pericytes.

8 External data

The fetal-maternal (FM) interface 10X dataset from Vento-Tormo et al. 2018 was downloaded from ArrayExpress.

#' @title fix_genes
#' @description Function to convert ensembl gene ids to gene names in fetal-maternal seurat object.
#' @param obj A seurat object.
#'
fix_genes <- function(obj) {
  
  # find unduplicated genes
  genes_list <- rownames(GetAssayData(obj,slot="counts"))
  genes_to_keep <- !duplicated(sub("-ENS.+$","",genes_list))
  orig_genes_to_keep <- genes_list[genes_to_keep]
    
  # slim down
  obj <- DietSeurat(obj,features=orig_genes_to_keep)
  
  # fix gene names
  rownames(obj@assays$RNA@counts) <- sub("-ENS.+$","",rownames(obj@assays$RNA@counts))
  rownames(obj@assays$RNA@data) <- sub("-ENS.+$","",rownames(obj@assays$RNA@data))
  obj@assays$RNA@meta.features <- obj@assays$RNA@meta.features[genes_to_keep,]
  rownames(obj@assays$RNA@meta.features) <- sub("-ENS.+$","",rownames(obj@assays$RNA@meta.features))
  obj@assays$RNA@var.features <- character()

  return(obj)
}

From the FM data, decidua samples are selected and ensembl gene ids are converted to gene names.

set.seed(123)

f <- readRDS("data/processed/fetal-maternal/emtab-6701.Rds")

# keep decidua samples
soi <- c("FCA7167222","FCA7167224","FCA7167226","FCA7196219","FCA7196225","FCA7474063")
soi <- rownames(f[[]][f[[]]$orig.ident %in% soi,])
f <- subset(f,cells=soi)

# change ens to gene name
library(org.Hs.eg.db)
library(clusterProfiler)

f <- fix_genes(f)

8.1 Stroma+Pericyte

Storma and pericyte cell types are subsetted for reanalysis.

f1 <- subset(f,subset=annotation %in% c("dS1","dS2","dS3","dP1","dP2"))
f1 <- NormalizeData(f1)
f1 <- ScaleData(f1,vars.to.regress=c("nFeature_RNA"))
f1 <- FindVariableFeatures(f1,selection.method="vst",nfeatures=5000)
f1 <- RunPCA(f1,verbose=F,seed.use=100)
f1 <- RunTSNE(f1,dims=1:13,seed.use=100)
f1 <- RunUMAP(f1,dims=1:13,seed.use=100)
UMAPPlot(f1)

Fig. 32: UMAP scatterplot showing stromal and pericyte cells.

8.2 Integration

FM decidua and endo datasets are merged and integrated (over individuals/runs), followed by standard Seurat processing.

DefaultAssay(f) <- "RNA"
f <- DietSeurat(f,assays="RNA")
f$batch <- "fm"

DefaultAssay(e) <- "RNA"
e1 <- DietSeurat(e,assays="RNA")
e1$batch <- "en"

efm <- merge(e1,f)
efm.list <- SplitObject(efm,split.by="orig.ident")

for(i in 1:length(efm.list)) {
    efm.list[[i]] <- NormalizeData(efm.list[[i]], verbose=FALSE)
    efm.list[[i]] <- FindVariableFeatures(efm.list[[i]], selection.method="vst",
        nfeatures=5000, verbose=FALSE)
}

efm.features <- SelectIntegrationFeatures(efm.list,nfeatures=5000)
efm.anchors <- FindIntegrationAnchors(object.list=efm.list,anchor.features=efm.features)
efm <- IntegrateData(anchorset=efm.anchors)
DefaultAssay(efm) <- "integrated"
efm <- ScaleData(efm)
Idents(efm) <- efm$orig.ident
efm

efm <- RunPCA(efm,verbose=F,seed.use=42)
efm <- RunTSNE(efm,dims=1:30,seed.use=100)
efm <- RunUMAP(efm,dims=1:30,seed.use=100)
## An object of class Seurat 
## 32197 features across 20629 samples within 2 assays 
## Active assay: integrated (5000 features, 5000 variable features)
##  1 other assay present: RNA
rm(f,e1,efm.list,efm.features,efm.anchors)

Dataset prefix is added to cell type annotations to enable identification of cells.

efm$annotation1 <- recode(efm$annotation,`dM1`="M",`dM2`="M",`dM3`="M",`dNK1`="Immune",`dNK2`="Immune",`dNK3`="Immune",`Endo (f)`="Endothelial",`Endo (m)`="Endothelial",`Endo L`="Endothelial",`Epi1`="Epithelial",`Epi2`="Epithelial",`fFB1`="Fibroblast",`fFB2`="Fibroblast",`dS1`="Stromal",`dS2`="Stromal",`dS3`="Stromal",`dP1`="Pericyte",`dP2`="Pericyte",`Stroma5`="Myofibroblast",`Tcells`="Immune")

efm$type <- ifelse(is.na(efm$annotation1),paste0("en-",efm$cell_type),paste0("fm-",efm$annotation1))
efm$subtype <- ifelse(is.na(efm$annotation1),paste0("en-",efm$cell_subtype),paste0("fm-",efm$annotation1))

#saveRDS(efm,file.path(path_export,"seurat-fetal-maternal-combined.Rds"))
TSNEPlot(efm,group.by="type")+
  guides(color=guide_legend(nrow=6,override.aes=list(size=2)))+
  theme(legend.position="top")

Fig. 33: tSNE scatterplot showing FM dataset integrated with current data. # Session

R packages in use and versions.

## R version 4.0.2 (2020-06-22)
## Platform: x86_64-conda_cos6-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
## 
## Matrix products: default
## BLAS/LAPACK: /home/roy/miniconda3/envs/r-4.0/lib/libopenblasp-r0.3.10.so
## 
## locale:
##  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
##  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
##  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] writexl_1.3.1               readxl_1.3.1               
##  [3] pheatmap_1.0.12             stringr_1.4.0              
##  [5] clusterProfiler_3.18.0      org.Hs.eg.db_3.12.0        
##  [7] AnnotationDbi_1.52.0        SeuratWrappers_0.3.0       
##  [9] velocyto.R_0.6              Matrix_1.3-2               
## [11] BiocManager_1.30.10         scater_1.18.3              
## [13] SingleR_1.4.0               celldex_1.0.0              
## [15] SoupX_1.4.8                 MAST_1.17.3                
## [17] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
## [19] Biobase_2.50.0              GenomicRanges_1.42.0       
## [21] GenomeInfoDb_1.26.2         IRanges_2.24.1             
## [23] S4Vectors_0.28.1            BiocGenerics_0.36.0        
## [25] MatrixGenerics_1.2.0        matrixStats_0.57.0         
## [27] sctransform_0.3.2           Seurat_3.2.3               
## [29] lubridate_1.7.9.2           leaflet_2.0.4.1            
## [31] fontawesome_0.2.1           captioner_2.2.3            
## [33] yaml_2.2.1                  formattable_0.2.1          
## [35] kableExtra_1.3.4            gdtools_0.2.3              
## [37] ggnewscale_0.4.5            gganimate_1.0.7            
## [39] plotly_4.9.3                ggiraph_0.7.8              
## [41] randomcoloR_1.1.0.1         ggpointdensity_0.1.0       
## [43] patchwork_1.1.1             ggplot2_3.3.3              
## [45] tidyr_1.1.2                 dplyr_1.0.2                
## [47] bookdown_0.21               knitr_1.30                 
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.1                scattermore_0.7              
##   [3] bit64_4.0.5                   irlba_2.3.3                  
##   [5] DelayedArray_0.16.0           data.table_1.13.6            
##   [7] rpart_4.1-15                  RCurl_1.98-1.2               
##   [9] generics_0.1.0                cowplot_1.1.1                
##  [11] RSQLite_2.2.2                 shadowtext_0.0.7             
##  [13] RANN_2.6.1                    future_1.21.0                
##  [15] enrichplot_1.10.1             bit_4.0.4                    
##  [17] spatstat.data_2.1-0           webshot_0.5.2                
##  [19] xml2_1.3.2                    httpuv_1.5.5                 
##  [21] assertthat_0.2.1              isoband_0.2.3                
##  [23] gifski_1.4.3-1                viridis_0.5.1                
##  [25] xfun_0.22                     hms_1.0.0                    
##  [27] jquerylib_0.1.3               evaluate_0.14                
##  [29] promises_1.1.1                progress_1.2.2               
##  [31] dbplyr_2.0.0                  igraph_1.2.6                 
##  [33] DBI_1.1.0                     htmlwidgets_1.5.3            
##  [35] purrr_0.3.4                   ellipsis_0.3.1               
##  [37] RSpectra_0.16-0               crosstalk_1.1.1              
##  [39] V8_3.4.0                      deldir_0.2-3                 
##  [41] sparseMatrixStats_1.2.0       vctrs_0.3.6                  
##  [43] remotes_2.2.0                 ROCR_1.0-11                  
##  [45] abind_1.4-5                   withr_2.3.0                  
##  [47] ggforce_0.3.2                 prettyunits_1.1.1            
##  [49] goftest_1.2-2                 svglite_2.0.0                
##  [51] cluster_2.1.0                 DOSE_3.16.0                  
##  [53] ExperimentHub_1.16.0          lazyeval_0.2.2               
##  [55] crayon_1.3.4                  hdf5r_1.3.3                  
##  [57] pkgconfig_2.0.3               labeling_0.4.2               
##  [59] tweenr_1.0.1                  nlme_3.1-151                 
##  [61] vipor_0.4.5                   rlang_0.4.11                 
##  [63] globals_0.14.0                lifecycle_1.0.0              
##  [65] miniUI_0.1.1.1                downloader_0.4               
##  [67] BiocFileCache_1.14.0          rsvd_1.0.3                   
##  [69] AnnotationHub_2.22.0          cellranger_1.1.0             
##  [71] polyclip_1.10-0               lmtest_0.9-38                
##  [73] zoo_1.8-8                     beeswarm_0.2.3               
##  [75] ggridges_0.5.3                png_0.1-7                    
##  [77] viridisLite_0.3.0             bitops_1.0-6                 
##  [79] KernSmooth_2.23-18            blob_1.2.1                   
##  [81] DelayedMatrixStats_1.12.1     qvalue_2.22.0                
##  [83] parallelly_1.23.0             beachmat_2.6.4               
##  [85] scales_1.1.1                  memoise_1.1.0                
##  [87] magrittr_2.0.1                plyr_1.8.6                   
##  [89] hexbin_1.28.2                 ica_1.0-2                    
##  [91] zlibbioc_1.36.0               scatterpie_0.1.5             
##  [93] compiler_4.0.2                RColorBrewer_1.1-2           
##  [95] pcaMethods_1.82.0             fitdistrplus_1.1-3           
##  [97] XVector_0.30.0                listenv_0.8.0                
##  [99] pbapply_1.4-3                 MASS_7.3-53                  
## [101] mgcv_1.8-33                   tidyselect_1.1.0             
## [103] stringi_1.5.3                 GOSemSim_2.16.1              
## [105] BiocSingular_1.6.0            ggrepel_0.9.0                
## [107] grid_4.0.2                    sass_0.3.1                   
## [109] fastmatch_1.1-0               tools_4.0.2                  
## [111] future.apply_1.7.0            rstudioapi_0.13              
## [113] uuid_0.1-4                    gridExtra_2.3                
## [115] farver_2.0.3                  Rtsne_0.15                   
## [117] ggraph_2.0.4                  rvcheck_0.1.8                
## [119] digest_0.6.27                 shiny_1.5.0                  
## [121] Rcpp_1.0.5                    scuttle_1.0.4                
## [123] BiocVersion_3.12.0            later_1.1.0.1                
## [125] RcppAnnoy_0.0.18              httr_1.4.2                   
## [127] colorspace_2.0-0              rvest_1.0.0                  
## [129] tensor_1.5                    reticulate_1.18              
## [131] splines_4.0.2                 uwot_0.1.10                  
## [133] spatstat.utils_2.1-0          graphlayouts_0.7.1           
## [135] systemfonts_1.0.1             xtable_1.8-4                 
## [137] jsonlite_1.7.2                tidygraph_1.2.0              
## [139] spatstat_1.64-1               R6_2.5.0                     
## [141] pillar_1.4.7                  htmltools_0.5.1.1            
## [143] mime_0.9                      glue_1.4.2                   
## [145] fastmap_1.0.1                 BiocParallel_1.24.1          
## [147] BiocNeighbors_1.8.2           interactiveDisplayBase_1.28.0
## [149] codetools_0.2-18              fgsea_1.16.0                 
## [151] lattice_0.20-41               bslib_0.2.4                  
## [153] tibble_3.0.4                  curl_4.3                     
## [155] ggbeeswarm_0.6.0              leiden_0.3.6                 
## [157] GO.db_3.12.1                  survival_3.2-7               
## [159] rmarkdown_2.8                 munsell_0.5.0                
## [161] DO.db_2.9                     GenomeInfoDbData_1.2.4       
## [163] xaringan_0.20                 reshape2_1.4.4               
## [165] gtable_0.3.0

Time difference of 2.782938 hours