1. Environmental data and heterogeneity

We selected key environmental factors that are commonly used to examine biodiversity-environment associations. These variables included mean annual temperature, mean annual precipitation, annual net primary productivity and elevation. Mean annual temperature, mean annual precipitation and elevation were downloaded as raster layers from the WorldClim database1 at a resolution of 2.5’. Annual net primary productivity was downloaded from NASA Moderate Resolution Imaging Spectroradiometer (MODIS) at a resolution of 1 km and calculated using the MOD17 algorithm. These variables were converted to Behrmann equal-area projection using the function projectRaster in the R package raster.2

We defined ‘environmental heterogeneity’ as the variation of environmental factors in each cell, obtained by taking the standard deviation of the environmental variables (temperature, precipitation, elevation and productivity) in each cell and across grain sizes (50, 100, 200, 400 and 800 km). Henceforth, we refer to the standard deviation of these four environmental variables i.e., mean annual temperature, mean annual precipitation, annual net primary productivity and elevation, as temperature variation, precipitation variation, productivity variation and elevation variation, respectively.

2. Environmental heterogeneity

To measure environmental heterogeneity, for each cell and across grain sizes (50, 100, 200, 400 and 800 km), we extracted the standard deviation of the four environmental variables (temperature, precipitation, elevation and productivity) using the function extract_climate in the R package phyloregionExtras:3

extract_climate <- function(x, y, val = "val", FUN = sd, ...){
  x <- raster::projectRaster(x, crs = ('+proj=longlat'))
  ras <- raster::projectRaster(x, crs = proj4string(y), over = TRUE)
  p <- raster::extract(ras, y, ...)
  m <- unlist(lapply(p, function(x) if (!is.null(x)) FUN(x, na.rm = TRUE) else NA))
  y$foo <- m
  names(y)[match("foo", names(y))] <- val
  y
}

######### Arguments #######################
# x: A raster file of the environmental variable
# y: Spatial polygons corresponding to the study area to extract the variable.
# val: String to rename the column with the extracted variable.
# FUN: The function to summarize the values (e.g. sd, mean)

3. Linear mixed-effects model

We fit a single linear mixed-effects model to analyse the effect of environmental heterogeneity on patterns of endemism (PE or WE) across grain sizes. A linear mixed-effects model allows the modeling of data as a combination of fixed effects, random effects and independent random error, and are especially useful when there is non-independence in the data.4 The standard form of a linear mixed-effects model is expressed as: \[Y_i = x_i\beta + s_i + Ɛ_i\] where \(Y_i\) represents the response variable at grid cell or location \(i\), \(x_i\) is a vector containing spatially referenced nonrandom predictors, and \(\beta\) is a vector of the fixed effect parameters, \(s_i\) is a spatially correlated random effect and \(Ɛ_i\) is the random measurement error. All explanatory variables were standardised prior to statistical analyses so that all variables had a standard deviation of 1 and a mean of 0. This ensures that the estimated coefficients are all on the same scale and for easier comparison of effect sizes.

The linear mixed-effects model was fit as a single model with all the variables in one model predicting endemism (PE or WE) as a function of the four environmental variables (temperature variation, precipitation variation, elevation variation and productivity variation), with continent identity of grid cells as a random effect, allowing us to include any idiosyncratic differences between continents. The model also includes a spatial covariate of geographical coordinates as an additional predictor variable to account for spatial autocorrelation. The spatial covariate was created as a matrix of the coordinates of each cell’s centroid corresponding to the geographical Cartesian \(x/y-\) coordinates (longitude and latitude)and was calculated with the function autocov_dist in the R package spdep.5 For each focal cell, we varied the weighting function and neighbourhood sizes using the next one to two cell neighbours to remove spatial autocorrelation (function nb2listw in spdep5). We used the maximum likelihood optimization criterion over restricted maximum likelihood, to allow significance testing via model comparison.

The linear mixed-effects model was fit in R v.3.6.3 with the lme function in the R package nlme6. The variations of environmental variables on endemism are presented as estimated coefficients of the fixed effects and their 95% confidence intervals in the R package nlme6.

In order to illustrate the process we have added all codes below for the analyses:

First, load the packages for the analysis:

# Function for computing environmental heterogeneity
clim_heterogeneity <- function(x, y ="y", nbs = 2000000,
                          x1 = "x1", x2 = "x2", x3 = "x3", x4 = "x4",
                          lon="lon", lat="lat", continent = "continent") {
  x <- as.data.frame(x)
  x <- x[complete.cases(x),]
  x <- x[, c(y, x1, x2, x3, x4, lon, lat, continent)]
  names(x) <- c("y", "x1", "x2", "x3", "x4", "lon", "lat", "continent")
  
  x$y <- scale(x$y, center = TRUE, scale = TRUE)
  x$x1 <- scale(x$x1, center = TRUE, scale = TRUE)
  x$x2 <- scale(x$x2, center = TRUE, scale = TRUE)
  x$x3 <- scale(x$x3, center = TRUE, scale = TRUE)
  x$x4 <- scale(x$x4, center = TRUE, scale = TRUE)

  # 1. Autocovariate regression: make a matrix of coordinates
  coords.mtrx <- as.matrix(x[, c(lon, lat)])

  # 2. Compute the autocovariate based on distance and weight above
  x$ac <- autocov_dist(as.vector(x$y), coords.mtrx, nbs = nbs,
                     type = "inverse.squared", zero.policy = TRUE)

  # 3. LME with autocovariate as additional predictor & continent as random effects
  mod1 <- lme(y ~ x1 + x2 + x3 + x4 + ac, random = ~1|continent, data = x, method="ML")

  df <- data.frame(intervals(mod1, which = "fixed")$fixed)
  fm <- anova.lme(mod1, type = "marginal")$'p-value'

  res <- cbind(df, p_val = fm)
  res <- res[!rownames(res) %in% c("(Intercept)", "ac"), ]
  rownames(res) <- c(x1, x2, x3, x4)

  res <- data.frame(clim = row.names(res), res, row.names = NULL)
  return(res)
}


######### Arguments #######################
# x: A data.frame
# y: A response variable
# nbs: Neighbourhood distance
# x1..x4 An explanatory variables corresponding to environmental variables
# lon: Column name of the longitude.
# lat: Column name of the latitude.
# continent: Column name of the column containing continent identity for grid cells

fun1 <- function(x){
    res <- strsplit(x, "\\/")[[1]]
    res <- res[[length(res)]]
    res <- strsplit(res, "\\_")[[1]][2]
    res <- gsub(".csv", "", res)
    res
}


# Download the data frames from Dropbox and save in temporary folder
dir = tempdir()
setwd(dir)
download.file(url = "https://github.com/darunabas/extras/archive/master.zip",
              destfile = "extras-master.zip")
unzip(zipfile = "extras-master.zip")
setwd(dir = "extras-master")
list.files()
## [1] "ENVIRONMENT" "mergedSA"    "shapefile"
t1 <- list.files('ENVIRONMENT/birds/WE', pattern="*.csv", full.names=TRUE)
t2 <- list.files('ENVIRONMENT/birds/PE', pattern="*.csv", full.names=TRUE)
t3 <- list.files('ENVIRONMENT/amphibians/WE', pattern="*.csv", full.names=TRUE)
t4 <- list.files('ENVIRONMENT/amphibians/PE', pattern="*.csv", full.names=TRUE)

f1 <- unlist(lapply(t1, fun1))
f2 <- unlist(lapply(t2, fun1))
f3 <- unlist(lapply(t3, fun1))
f4 <- unlist(lapply(t4, fun1))

# load the data frames for each grain size
d1 <- lapply(t1, function(x) read.csv(x, stringsAsFactors = FALSE))
d2 <- lapply(t2, function(x) read.csv(x, stringsAsFactors = FALSE))
d3 <- lapply(t3, function(x) read.csv(x, stringsAsFactors = FALSE))
d4 <- lapply(t4, function(x) read.csv(x, stringsAsFactors = FALSE))

head(d1[[1]])
##    grids        WE    sd_ALT    sd_MAP   sd_MAT       sd_NPP      lon      lat
## 1 v10084  340.2460  562.8624   71.9370 3.848896  15284319418 -7052646 -1737713
## 2 v10088  606.7684  404.9584   50.2280 2.356266  32349944611 -6852646 -1737713
## 3 v10089 1143.1800  767.3688  395.3724 3.936351  18044787059 -6802646 -1737713
## 4 v10090 1437.5099 1187.8156 1062.3708 5.590839  63536778570 -6752646 -1737713
## 5 v10091 1488.2875  348.5838  380.4115 1.503277 203536278064 -6702646 -1737713
## 6 v10092  867.2580  398.0916  191.2684 1.678670   3244896302 -6652646 -1737713

4. Visualisation of results

4.1. Environmental heterogeneity and bird weighted endemism

spo <- lapply(d1, function(x) clim_heterogeneity(x, y = "WE", x1 = "sd_ALT",
                                                 x2 = "sd_MAP", x3 = "sd_MAT",
                                                 x4 = "sd_NPP", nbs = 2000000))
ll <- lengths(spo)-1
z1 <- data.frame(do.call("rbind", spo), scale=rep(f1, ll))

pd <- position_dodge(width = 0.2)

ggplot(z1, aes(x=scale, y=est., group=clim, color=clim)) +
    geom_line(position = pd) +
    geom_point(size = 2, position = pd)+
    geom_errorbar(aes(ymax = upper, ymin = lower),
                  size  = .4,
                  width = .15,
                  position = pd)+
    theme_classic() +
    scale_color_manual(values=c('#1b9e77','#d95f02', "#7570b3", "#e7298a"))+
    labs(x = "", y = "Estimated effects")

Plot of environmental heterogeneity on bird weighted endemism

4.2. Environmental heterogeneity and bird phylogenetic endemism

spo <- lapply(d2, function(x) clim_heterogeneity(x, y = "PE", x1 = "sd_ALT",
                                                 x2 = "sd_MAP", x3 = "sd_MAT",
                                                 x4 = "sd_NPP"))
ll <- lengths(spo)-1
z1 <- data.frame(do.call("rbind", spo), scale=rep(f2, ll))

pd <- position_dodge(width = 0.2)

ggplot(z1, aes(x=scale, y=est., group=clim, color=clim)) +
    geom_line(position = pd) +
    geom_point(size = 2, position = pd)+
    geom_errorbar(aes(ymax = upper, ymin = lower),
                  size  = .4,
                  width = .15,
                  position = pd)+
    theme_classic() +
    scale_color_manual(values=c('#1b9e77','#d95f02', "#7570b3", "#e7298a"))+
    labs(x = "", y = "Estimates of fixed effects")

Plot of environmental heterogeneity on bird phylogenetic endemism

4.3. Environmental heterogeneity and amphibian weighted endemism

spo <- lapply(d3, function(x) clim_heterogeneity(x, y = "WE", x1 = "sd_ALT",
                                                 x2 = "sd_MAP", x3 = "sd_MAT",
                                                 x4 = "sd_NPP"))
ll <- lengths(spo)-1
z1 <- data.frame(do.call("rbind", spo), scale=rep(f3, ll))

pd <- position_dodge(width = 0.2)

ggplot(z1, aes(x=scale, y=est., group=clim, color=clim)) +
    geom_line(position = pd) +
    geom_point(size = 2, position = pd)+
    geom_errorbar(aes(ymax = upper, ymin = lower),
                  size  = .4,
                  width = .15,
                  position = pd)+
    theme_classic() +
    scale_color_manual(values=c('#1b9e77','#d95f02', "#7570b3", "#e7298a"))+
    labs(x = "", y = "Estimates of fixed effects")

Plot of environmental heterogeneity on amphibian weighted endemism

4.4. Environmental heterogeneity and amphibian phylogenetic endemism

spo <- lapply(d4, function(x) clim_heterogeneity(x, y = "PE", x1 = "sd_ALT",
                                                 x2 = "sd_MAP", x3 = "sd_MAT",
                                                 x4 = "sd_NPP"))
ll <- lengths(spo)-1
z1 <- data.frame(do.call("rbind", spo), scale=rep(f4, ll))

pd <- position_dodge(width = 0.2)

ggplot(z1, aes(x=scale, y=est., group=clim, color=clim)) +
    geom_line(position = pd) +
    geom_point(size = 2, position = pd)+
    geom_errorbar(aes(ymax = upper, ymin = lower),
                  size  = .4,
                  width = .15,
                  position = pd)+
    theme_classic() +
    scale_color_manual(values=c('#1b9e77','#d95f02', "#7570b3", "#e7298a"))+
    labs(x = "", y = "Estimates of fixed effects")

Plot of environmental heterogeneity on amphibian phylogenetic endemism

Session Information

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] nlme_3.1-145  ggplot2_3.3.0 spdep_1.1-3   sf_0.8-1      spData_0.3.3  sp_1.4-1     
## [7] knitr_1.28   
## 
## loaded via a namespace (and not attached):
##  [1] gtools_3.8.1       tidyselect_1.0.0   xfun_0.12          purrr_0.3.3       
##  [5] splines_3.6.1      lattice_0.20-40    colorspace_1.4-1   expm_0.999-4      
##  [9] htmltools_0.4.0    yaml_2.2.1         rlang_0.4.5        e1071_1.7-3       
## [13] pillar_1.4.3       glue_1.3.2         withr_2.1.2        DBI_1.1.0         
## [17] lifecycle_0.2.0    stringr_1.4.0      munsell_0.5.0      gtable_0.3.0      
## [21] raster_3.0-12      codetools_0.2-16   coda_0.19-3        evaluate_0.14     
## [25] labeling_0.3       class_7.3-15       highr_0.8          Rcpp_1.0.3        
## [29] KernSmooth_2.23-16 scales_1.1.0       classInt_0.4-2     gdata_2.18.0      
## [33] farver_2.0.3       deldir_0.1-25      digest_0.6.25      stringi_1.4.6     
## [37] gmodels_2.18.1     dplyr_0.8.5        grid_3.6.1         tools_3.6.1       
## [41] LearnBayes_2.15.1  magrittr_1.5       tibble_2.1.3       crayon_1.3.4      
## [45] pkgconfig_2.0.3    MASS_7.3-51.5      Matrix_1.2-18      assertthat_0.2.1  
## [49] rmarkdown_2.1      R6_2.4.1           boot_1.3-24        units_0.6-5       
## [53] compiler_3.6.1

REFERENCES

1. Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology 25, 1965–1978 (2005).

2. Hijmans, R. J. Raster: Geographic data analysis and modeling. (2020).

3. Daru, B. H., Karunarathne, P. & Schliep, K. PhyloregionExtras: Extras for biogeographic regionalization and spatial conservation. (2020).

4. Verbeke, G. & Molenberghs, G. Linear mixed models for longitudinal data. in Linear mixed models in practice 63–153 (Springer-Verlag, 2000).

5. Bivand, R. & Wong, D. W. S. Comparing implementations of global and local indicators of spatial association. TEST 27, 716–748 (2018).

6. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. & R Core Team. nlme: Linear and nonlinear mixed effects models. (2020).