More than likely, you are going to want to use another package to
analyze the simulated data produced by {scSpatialSIM}
.
Here, we will show how the output from {scSpatialSIM}
works
seemlessly with the spatial analysis package,
{spatialTIME}
First thing we need to do is simulate data. The
vignette("Introduction)
showed us how we can do this in a
univariate way where only a single cells positivity was simulated and
bivaraiate way where we get positivity for multiple cell types. We can
use this data with {spatialTIME}
to derive spatial
statistics like Ripley’s K. See the spatialTIME package for
more information about everything the package can do.
In this vignette, we will stick with a single cell type and look at univariate clustering with and without holes to display the caution that should be taken when using the observed Ripley’s K values.
#set seed
set.seed(333)
#create the new object
sim_object = CreateSimulationObject(sims = 5, cell_types = 1) %>%
#produce the point pattern
GenerateSpatialPattern()
#> No `window` specified - defaulting to x (0, 10); y (0, 10)
#make tissues
sim_object = GenerateTissue(sim_object, density_heatmap = F) %>%
#create holes
GenerateHoles(hole_prob = c(0.3, 0.5), density_heatmap = F) %>%
#Create positive/negative cells
GenerateCellPositivity(probs = c(0, 0.9))
#> Computing tissue probability
#> Computing hole probability
#> Computing probability for Cell 1
With the simulation object filled in with a single cell type, it can
be split into a spatial list as well as a summary using
CreateSpatialList()
and
SummariseSpatial()
.
#creating the spatial list
spatial_list = CreateSpatialList(sim_object, single_df = F)
#summarise the spatial list
summary_df = SummariseSpatial(spatial_list, markers = "Cell 1 Assignment")
head(summary_df)
#> # A tibble: 5 × 4
#> `Sample Tag` `Total Cells` `Cell 1 Assignment` `% Cell 1 Assignment`
#> <chr> <int> <dbl> <dbl>
#> 1 Spatial Data 1 2495 673 27.0
#> 2 Spatial Data 2 2450 475 19.4
#> 3 Spatial Data 3 2500 582 23.3
#> 4 Spatial Data 4 2514 525 20.9
#> 5 Spatial Data 5 2486 450 18.1
spatialTIME
library(spatialTIME)
#> spatialTIME version:
#> 1.3.4.5
#> If using for publication, please cite our manuscript:
#> https://doi.org/10.1093/bioinformatics/btab757
The {spatialTIME}
package requires the input of 3
things. 1) spatial data frames in list format, 2) a summary of those
spatial data frames, and 3) a clinical data frames The summary data
frame acts as a linker between the spatial data frame IDs and the
entries in the clinical data frame. Our spatial data frames need 1 more
column: their name. This is pretty easy to do.
#loop over all spatial data frames and add their names
sf_names = names(spatial_list)
spatial_list = lapply(setNames(sf_names, sf_names), function(nam){
spatial_list[[nam]] %>%
dplyr::mutate(`Sample Tag` = nam, .before = 1)
})
Our summary data frame already contains the names of the spatial data frames but needs a patient ID added.
The number 3 data we needed was the clinical data frame. For the sake of this example we will just make a single columned data frame with 1 through 5 to match the “Patient ID” in the summary data.
These column names will be used when creating the mIF
object along with the 3 data objects.
mif = create_mif(clinical_data = clinical,
sample_data = summary_df,
spatial_list = spatial_list,
patient_id = "Patient ID",
sample_id = "Sample Tag")
mif
#> 5 patients spanning 5 samples and 5 spatial data frames were found
Since our window is from 0 to 10 (10 units), we should keep our search radius max less than 5. Lets do Univariate Ripley’s K from 0 to 3 units and visualize the results.
mif = ripleys_k(mif = mif, mnames = "Cell 1 Assignment", r_range = seq(0, 3, 0.1),
permute = FALSE, edge_correction = "translation", workers = 1,
xloc = "x", yloc = "y")
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
library(ggplot2)
mif$derived$univariate_Count %>%
ggplot() +
geom_line(aes(x = r, y = `Degree of Clustering Exact`, color = `Sample Tag`)) +
labs(title = "Univariate Clustering - Simulation")
Up until now, we have been using the all simulated cells. We can
filter the spatial_list down to create another mIF object that doesn’t
have the cells belonging to the class of “hole”
(Hole Assignment
== “Keep”). For now, sub-setting to only
Tissue 1 or Tissue 2 will be left alone. Let’s create this mIF and run
Ripley’s K on it.
mif_holes = create_mif(clinical_data = clinical,
sample_data = summary_df,
spatial_list = lapply(spatial_list, function(spat){
spat %>%
dplyr::filter(`Hole Assignment` == "Keep")
}),
patient_id = "Patient ID",
sample_id = "Sample Tag")
mif_holes = ripleys_k(mif = mif_holes, mnames = "Cell 1 Assignment", r_range = seq(0, 3, 0.1),
permute = FALSE, edge_correction = "translation", workers = 1,
xloc = "x", yloc = "y")
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
#> Joining with `by = join_by(r)`
To see the importance of using the exact CSR approach (or permutation) when there are holes on samples, first look at the spatial plots to see how much the holes are impacting the distribution of the points.
mif_holes$spatial %>%
do.call(dplyr::bind_rows, .) %>%
ggplot() +
geom_point(aes(x = x, y = y, color = as.factor(`Cell 1 Assignment`))) +
facet_wrap(~`Sample Tag`)
The regions with holes are rather small so wouldn’t
dat = do.call(dplyr::bind_rows, mif_holes$derived$univariate_Count)
dat %>%
dplyr::mutate(`Exact-Theo` = `Exact CSR` - `Theoretical CSR`) %>%
ggplot() +
geom_density(aes(x = `Exact-Theo`, fill = `Sample Tag`), alpha = 0.2, adjust = 0.2)
This clearly shows that the exact typically has larger values than those of the theoretical (positive values). These can be visualized by calculating the frequency of rs that have an exact CSR measurement greater than the theoretical CSR estimate, indicating the the theoretical is underestimating the CSR of the cores.
dat %>%
dplyr::mutate(`Exact-Theo` = `Exact CSR` - `Theoretical CSR`) %>%
dplyr::group_by(`Sample Tag`) %>%
dplyr::mutate(prop = ifelse(`Exact-Theo` > 0, 1/dplyr::n(), 0)) %>%
dplyr::select(`Sample Tag`, r, `Exact-Theo`, prop) %>%
dplyr::summarise(`Total Fraction` = sum(prop))
#> # A tibble: 5 × 2
#> `Sample Tag` `Total Fraction`
#> <chr> <dbl>
#> 1 Spatial Data 1 0.355
#> 2 Spatial Data 2 0.645
#> 3 Spatial Data 3 0.968
#> 4 Spatial Data 4 0.968
#> 5 Spatial Data 5 0.968