Beyond estimates of occurrence and relative abundance, the eBird Status and Trends products contain information about predictor importance, predictor directionality (also known as partial dependencies), as well as predictive performance metrics (PPMs). The PPMs can be used to evaluate statistical performance of the models, either over the entire spatiotemporal extent of the model results, or for a specific region and season. Predictor Importances (PIs) and Partial Dependencies (PDs) can be used to understand relationships between occurrence and abundance and predictors, most notably the land cover variables used in the model. When PIs and PDs are combined, we can depict habitat association and avoidance, as well as the strength of the relationship with those habitats. The functions described in this section help load this data from the results packages, give tools for assessing predictive performance, and synthesize information about predictor importances and partial dependencies.
IMPORTANT. AFTER DOWNLOADING THE RESULTS, DO NOT CHANGE THE FILE STRUCTURE. All functionality in this package relies on the structure inherent in the delivered results. Changing the folder and file structure will cause errors with this package.
Data are stored in four files, found under
/<run_name>/results/abund_preds/unpeeled_folds, as described below.
ebirdst package provides functions for accessing these, such that you should never have to handle them manually, granted that the original file structure of the results is maintained. These data are stored at “stixel centroids,” which are the centers of the independent, partitioned, regional model extents (see Fink et al. 2010, Fink et al. 2013).
The first step when working with stixel centroid data is to load the Predictor Importances (PIs) and the Partial Dependencies (PDs). These files will be used for all of the functions in this vignette and are the input to many of the functions in
library(ebirdst) # DOWNLOAD DATA # Currently, example data is available on a public s3 bucket. The following # ebirdst_download() function copies the species results to a selected path and # returns the full path of the results. Please note that the example_data is # for Yellow-bellied Sapsucker and has the same run code as the real data, # so if you download both, make sure you put the example_data somewhere else. # Because the non-raster data is large, there is a parameter on the # ebirdst_download function that defaults to downloading only the raster data. # To access the non-raster data, set tifs_only = FALSE. sp_path <- ebirdst_download(species = "example_data", tifs_only = FALSE) print(sp_path) #>  "/Users/mes335/Library/Application Support/ebirdst/yebsap-ERD2016-EBIRD_SCIENCE-20180729-7c8cec83" pis <- load_pis(sp_path) pds <- load_pds(sp_path)
When working with Predictive Performance Metrics (PPMs), PIs, and/or PDs, it is very common to select a subset of space and time for analysis. In
ebirdst this is done by creating a spatiotemporal extent object with
ebirdst_extent(). These objects define the region and season for analysis and are passed to many functions in
ebirdst. To review the available stixel centroids associated with both PIs and PDs and to see which have been selected by a spatiotemporal subset, use the
map_centroids function, which will map and summarize this information.
calc_effective_extent() will analyze a spatiotemporal subset of PIs or PDs and plot the selected stixel centroids, as well as a
RasterLayer depicting where a majority of the information is coming from. The map ranges from 0 to 1, with pixels have a value of 1 meaning that 100% of the selected stixels are contributing information at that pixel. The function returns the
RasterLayer in addition to mapping.
Once predictive performance has been evaluated, exploring information about habitat association and/or avoidance can be done using the PIs and PDs. The
plot_pis() function generates a bar plot showing a rank of the most important predictors within a spatiotemporal subset. There is an option to show all predictors or to aggregate FRAGSTATS by the land cover types.
Beyond confidence intervals provided for the abundance estimates, the centroid data can also be used to calculate predictive performance metrics, to get an idea as to whether there is substantial statistical performance to evaluate information provided by PIs and PDs (as well as abundance and occurrence information).
plot_binary_by_time() function analyzes a species’ entire range of data and plots predictive performance metrics by a custom time interval (typically either 52 for weeks or 12 for months).
plot_all_ppms() function provides all available predictive performance metrics and is important for determining predictive performance within a spatiotemporal subset region and season. This function is also useful in comparing the performance between subsets.
Fink, D., Damoulas, T., & Dave, J. (2013, July). Adaptive Spatio-Temporal Exploratory Models: Hemisphere-wide species distributions from massively crowdsourced eBird data. In AAAI.
Fink, D., Hochachka, W. M., Zuckerberg, B., Winkler, D. W., Shaby, B., Munson, M. A., … & Kelling, S. (2010). Spatiotemporal exploratory models for broad‐scale survey data. Ecological Applications, 20(8), 2131-2147.