Extracing AgERA5 data with ag5Tools

Why a extracting function for agERA5 data?

The agERA5 data is downloaded from the Copernicus Climate Data Store as NetCDF files format, with a file extension .nc.

The data is provided as daily observations and each file correspond to a day. For instance, if you want precipitation data for year 2010, you will have 365 files (or 366 in the case of a leap year).

In case you want to use the precipitation data as model covariates, you have to seek for the specific dates and extract the data corresponding to the locations where the trial was established. Instead, you can use the ag5_extract function as demostrated in the next section.

Let’s say you have observations from field trials of a trait of interest (e.g., yield) and want to link that with rainfall data to explore the effect of rainfall on that trait. You know the the plating and harvest dates for each trial plot. Then you can extract the time series of daily rainfall starting at plating date and finishing at harvest date.

Our synthetic example data shows the dates for random locations in Arusha, Tanzania.


data("arusha_df", package = "ag5Tools")

head(arusha_df)
#>        lon       lat start_date   end_date
#> 1 35.72636 -2.197162 1991-04-22 1991-08-20
#> 2 36.10249 -2.850983 1990-01-24 1990-05-24
#> 3 35.46292 -3.602582 1991-03-06 1991-07-04
#> 4 36.29166 -3.855945 1990-10-10 1991-02-07
#> 5 35.45254 -3.616361 1990-01-22 1990-05-22
#> 6 35.40131 -3.216106 1990-10-19 1991-02-16

With ag5_extract() function you can extract the required data, as long as you have downloaded it already.


library(ag5Tools)

arusha_rainfall <- ag5_extract(coords = arusha_df,
                               variable = "Precipitation-Flux",
                               path = "D:/agera5_data/")

Notice that the data.frame arusha_df already has column names that match the default arguments in the function. If they do not match, you should indicate the column names as arguments of the function ag5_download().

For example, if your data.frame has location columns named x and y and dates named as planting_date and harvest_date, the call to function ag5_extract() will look like:

arusha_rainfall <- ag5_extract(coords = example_df,
                               lon = "x",
                               lat = "y",
                               start_date = "planting_date",
                               end_date = "harvest_date",
                               variable = "Precipitation-Flux",
                               path = "D:/agera5_data/")

Notice that you do not need to worry about providing the specific folder where the files are located, but only the root folder where you know the files are. For instance, if you stored all the AgERA5 data files in folder D:/agera5_data/, but you also have sub-folders for precipitation and temperature data, it is not required to specify that in the path argument. In this case, only providing the path D:/agera5_data/ will suffice.