dynamicSDM: Explanatory variable data

library(dynamicSDM)

Stage 2: Explanatory variable data

In this tutorial, we will be extracting spatio-temporally buffered explanatory variables for each occurrence and pseudo-absence record. The dynamicSDM functions for extracting such variables require Google Earth Engine and Google Drive to be initialised. Fill in the code below with your Google account email, and run the code to check that rgee and googledrive have been correctly installed and authorised.

library(rgee)
rgee::ee_check()

library(googledrive)
googledrive::drive_user()

# Set your user email here
#user.email<-"your_google_email_here"

Note: You will need internet connection for this tutorial. Variable extraction may take some time depending on your internet connection strength. If you try out these functions and are excited to move onto the next tutorial, then don’t worry - you can read the extracted data into your R environment from the dynamicSDM package.

Directory organisation

We will be extracting data for three dynamic explanatory variables. Let’s first create new folders within the project directory to export extracted variable data to.

project_directory <- file.path(file.path(tempdir(), "dynamicSDM_vignette"))

dir.create(project_directory)
#> Warning in dir.create(project_directory): '/tmp/Rtmpu2MKeg/dynamicSDM_vignette'
#> already exists

variablenames<-c("eight_sum_prec","year_sum_prec","grass_crop_percentage")

extraction_directories <- file.path(file.path(project_directory,"extraction"))
dir.create(extraction_directories)

extraction_directory_1 <- file.path(file.path(project_directory,variablenames[1]))
dir.create(extraction_directory_1)

extraction_directory_2 <- file.path(file.path(project_directory,variablenames[2]))
dir.create(extraction_directory_2)

extraction_directory_3 <- file.path(file.path(project_directory,variablenames[3]))
dir.create(extraction_directory_3)

Now, the filtered occurrence and pseudo-absence record data frame generated in the first tutorial can be imported or read into your R environment from the dynamicSDM package.

# sample_filt_data<-read.csv(paste0(project_directory,"/filtered_quelea_occ.csv"))
data(sample_filt_data)

a) Extract dynamic explanatory variables

extract_dynamic_coords() extracts processed remote sensing data using the Google Earth Engine cloud servers. There are various arguments to this function to specify the explanatory variable including:

• datasetname: the dataset’s Google Earth Engine catalogue name.

• bandname : the band of interest with the dataset.

• temporal.res : the temporal resolution (i.e. the number of days to calculate the variable over).

• temporal.direction: temporal direction (days either prior or post each record’s date).

• spatial.res.metres: spatial resolution (the resolution in metres to extract data at).

• GEE.math.fun : the mathematical function to calculate across the period (e.g. mean, sum or standard deviation across the given period).

Case study

The distribution of our case study species, the red-billed quelea, is driven by precipitation levels. Run the code below to extract the sum of precipitation across the 8-week and 52-week period prior to each occurrence record from the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) dataset at GEE.

For the 8-week precipitation extraction, we will use the split method to save extracted data. Notice how each record’s data are extracted and exported individually. If you specify resume = T, then if internet connection is lost, progress can be resumed.

# 8-week total precipitation
extract_dynamic_coords(occ.data=sample_filt_data,
                       datasetname = "UCSB-CHG/CHIRPS/DAILY",
                       bandname="precipitation",
                       spatial.res.metres = 5566 ,
                       GEE.math.fun = "sum",
                       temporal.direction = "prior",
                       temporal.res = 56,
                       save.method = "split",
                       varname = variablenames[1],
                       save.directory = extraction_directory_1)

For the 52-week precipitation extraction, we will use the combined method to save extracted data. Here, all data are extracted and then exported as a single data frame. This approach writes fewer files but may be more vulnerable to internet connection outage because all progress will be lost and cannot be resumed.

# 52-week total precipitation
extract_dynamic_coords(occ.data=sample_filt_data,
                       datasetname = "UCSB-CHG/CHIRPS/DAILY",
                       bandname = "precipitation",
                       spatial.res.metres = 5566 ,
                       GEE.math.fun = "sum",
                       temporal.direction = "prior",
                       temporal.res = 364,
                       save.method = "combined",
                       varname = variablenames[2],
                       save.directory = extraction_directory_2)

b) Extract spatially buffered explanatory variables

extract_buffered_coords()extracts explanatory variable data across a spatial buffer from occurrence record co-ordinates. These variables can be categorical or continuous, but if a temporal buffer is also used only continuous data will work. This function utilises a “moving window matrix” that specifies the neighbourhood of cells (spatial buffer area) surrounding each occurrence record’s cell that will also be included in the calculation. get_moving_window() generates the optimal “moving window matrix” sizes based upon a given spatial radius and resolution of remote-sensing data.

Case study

The distribution of red-billed quelea is driven by availability of wild grass and cereal crop seed availability. The code below extracts the total number of grassland or cereal cropland cells across a spatial buffer from the MODIS Annual Land Cover Type dataset googleearthenginecatalogue.

First, however, we must generate the optimal moving window matrix for this calculated based upon the fact that quelea travel up to 10km to access resources and that the data will be at 0.05 degree resolution (500m aggregated by 12).

matrix <- get_moving_window(radial.distance = 10000,
                                        spatial.res.degrees = 0.05,
                                        spatial.ext = c(-35, -6, 10, 40))
matrix
#>      [,1] [,2] [,3]
#> [1,]    1    1    1
#> [2,]    1    1    1
#> [3,]    1    1    1

# Total grassland and cereal cropland cells in surrounding area
extract_buffered_coords(occ.data=sample_filt_data,
                        datasetname = "MODIS/006/MCD12Q1",
                        bandname="LC_Type5",
                        spatial.res.metres = 500,
                        GEE.math.fun = "sum",
                        moving.window.matrix=matrix,
                        user.email= user.email,
                        save.method="split",
                        temporal.level="year",
                        categories=c(6,7),
                        agg.factor = 12,
                        varname = variablenames[3],
                        save.directory=extraction_directory_3)

c) Combine explanatory variable data

Data for each explanatory variable have been saved across multiple directories and files. extract_coord_combine() combine the extracted explanatory variable data into a single data frame.

complete.dataset <- extract_coords_combine(varnames = variablenames,
                                           local.directory = c(extraction_directory_1,
                                                               extraction_directory_2,
                                                               extraction_directory_3))

Summary

At the end of this vignette, we now have a complete data frame of filtered species occurrence and pseudo-absence records with associated extracted dynamic variables. Let’s save this to our project directory for use in the next tutorial!

# Set NA values as zero 
complete.dataset[is.na(complete.dataset$grass_crop_percentage),"grass_crop_percentage"]<-0

write.csv(complete.dataset, file = paste0(project_directory, "/extracted_quelea_occ.csv"))