Getting Started

Matthew Cornell

2020-04-01

Getting Started with zoltr

zoltr is an R package that simplifies access to the zoltardata.com API. This vignette takes you through the package’s main features. So that you can experiment without needing a Zoltar account, we use the example project from docs.zoltardata.com, which should always be available for public read-only access.

NOTE: You will need an account to access the zoltar API - please see docs.zoltardata.com for details.

Connect to the host and authenticate

The starting point for working with Zoltar’s API is a ZoltarConnection object, obtained via the new_connection function. Most zoltr functions take a ZoltarConnection along with the API URL of the thing of interest, e.g., a project, model, or forecast. API URLs look like https://www.zoltardata.com/api/project/3/, which is that of the “Docs Example Project”. An important note regarding URLs:

zoltr's convention for URLs is to require a trailing slash character ('/') on all URLs. The only exception is the
optional `host` parameter passed to `new_connection()`. Thus, `https://www.zoltardata.com/api/project/3/` is valid,
but `https://www.zoltardata.com/api/project/3` is not.

You can obtain a URL using some of the *_info functions, and you can always use the web interface to navigate to the item of interest and look at its URL in the browser address field. Keep in mind that you’ll need to add api to the browsable address, along with the trailing slash character. For example, if you browsed the Docs Example Project project at (say) https://www.zoltardata.com/project/3 then its API for use in zoltr would be https://www.zoltardata.com/api/project/3/.

As noted above, all API calls require an account. To access your project, you’ll first need to authenticate via the zoltar_authenticate() function. Pass it the username and password for your account. Notes:

For this and other vignettes, you will need to create an .Renviron file that contains Z_USERNAME and Z_PASSWORD variables that match your account settings (note the Z_ prefix). Then you’ll be able to create an authenticated connection:

library(zoltr)
zoltar_connection <- new_connection()
zoltar_authenticate(zoltar_connection, Sys.getenv("Z_USERNAME"), Sys.getenv("Z_PASSWORD"))
zoltar_connection

Get a list of all projects on the host

Now that you have a connection, you can use the projects() function to get all projects as a data.frame. Note that it will only list those that you are authorized to access, i.e., all public projects plus any private ones that you own or are a model owner.

the_projects <- projects(zoltar_connection)
str(the_projects)
#> 'data.frame':    4 obs. of  10 variables:
#>  $ id                   : int  5 232 233 234
#>  $ url                  : chr  "http://127.0.0.1:8000/api/project/5/" "http://127.0.0.1:8000/api/project/232/" "http://127.0.0.1:8000/api/project/233/" "http://127.0.0.1:8000/api/project/234/"
#>  $ owner_url            : chr  "http://127.0.0.1:8000/api/user/4/" "http://127.0.0.1:8000/api/user/4/" "http://127.0.0.1:8000/api/user/4/" "http://127.0.0.1:8000/api/user/4/"
#>  $ public               : logi  FALSE TRUE FALSE TRUE
#>  $ name                 : chr  "Impetus Province Forecasts" "public project" "private project" "Docs Example Project"
#>  $ description          : chr  "Impetus Project forecasts for real-time dengue hemorrhagic fever (DHF) in Thailand. Beginning in May 2017, this"| __truncated__ "description" "description" "A full description of my project is here. You could include narrative details about what seasons are included, "| __truncated__
#>  $ home_url             : chr  "http://www.iddynamics.jhsph.edu/projects/impetus" "http://example.com/" "http://example.com/" "https://reichlab.io"
#>  $ time_interval_type   : chr  "Biweek" "Week" "Week" "Week"
#>  $ visualization_y_label: chr  "DHF cases" "" "" "the scale for your variable of interest"
#>  $ core_data            : chr  "https://github.com/reichlab/dengue-data" "http://example.com/" "http://example.com/" ""

Get a project to work with and list its info, models, and scores

Let’s start by getting a public project to work with. We will search the projects list for it by name. Then we will pass its URL to the project_info() function to get a list of details, and then pass it to the models() function to get a data.frame of its models.

project_url <- the_projects[the_projects$name == "Docs Example Project", "url"]
the_project_info <- project_info(zoltar_connection, project_url)
names(the_project_info)
#>  [1] "id"                    "url"                   "owner"                
#>  [4] "is_public"             "name"                  "description"          
#>  [7] "home_url"              "logo_url"              "core_data"            
#> [10] "time_interval_type"    "visualization_y_label" "truth"                
#> [13] "model_owners"          "score_data"            "models"               
#> [16] "units"                 "targets"               "timezeros"
the_project_info$description
#> [1] "A full description of my project is here. You could include narrative details about what seasons are included, what group has provided data, whether the project focuses on real-time or retrospective forecasts."

the_models <- models(zoltar_connection, project_url)
str(the_models)
#> 'data.frame':    1 obs. of  8 variables:
#>  $ id          : int 230
#>  $ url         : chr "http://127.0.0.1:8000/api/model/230/"
#>  $ project_url : chr "http://127.0.0.1:8000/api/project/234/"
#>  $ owner_url   : logi NA
#>  $ name        : chr "docs forecast model"
#>  $ description : chr ""
#>  $ home_url    : chr ""
#>  $ aux_data_url: logi NA

There is other project-related information that you can access, such as its configuration (zoltar_units(), targets(), and timezeros() - concepts that are explained at docs.zoltardata.com ), scores() and truth(). As an example, let’s get its score data. (Note that available scores are limited due to the nature of the example project.)

score_data <- scores(zoltar_connection, project_url)
score_data
#> # A tibble: 0 x 10
#> # … with 10 variables: model <chr>, timezero <chr>, season <chr>, unit <chr>,
#> #   target <chr>, error <chr>, abs_error <chr>, log_single_bin <chr>,
#> #   log_multi_bin <chr>, pit <chr>

Get a model to work with and list its info and forecasts

Now let’s work with a particular model, getting its URL by name and then passing it to the model_info() function to get details. Then use the forecasts() function to get a data.frame of that model’s forecasts (there is only one). Note that obtaining the model’s URL is straightforward because it is provided in the url column of the_models.

model_url <- the_models[the_models$name == "docs forecast model", "url"]
the_model_info <- model_info(zoltar_connection, model_url)
names(the_model_info)
#> [1] "id"           "url"          "project"      "owner"        "name"        
#> [6] "abbreviation" "description"  "home_url"     "aux_data_url"
the_model_info$name
#> [1] "docs forecast model"

the_forecasts <- forecasts(zoltar_connection, model_url)
str(the_forecasts)
#> 'data.frame':    1 obs. of  8 variables:
#>  $ id                : int 185
#>  $ url               : chr "http://127.0.0.1:8000/api/forecast/185/"
#>  $ forecast_model_url: chr "http://127.0.0.1:8000/api/model/230/"
#>  $ source            : chr "docs-predictions.json"
#>  $ timezero_url      : chr "http://127.0.0.1:8000/api/timezero/705/"
#>  $ created_at        : Date, format: "2020-04-14"
#>  $ notes             : chr "a small prediction file"
#>  $ forecast_data_url : chr "http://127.0.0.1:8000/api/forecast/185/data/"

Finally, download the forecast’s data

You can get forecast data using the download_forecast() function, which is in a nested list format. Please see docs.zoltardata.com for forecast format details.

forecast_url <- the_forecasts[1, "url"]
forecast_info <- forecast_info(zoltar_connection, forecast_url)
forecast_data <- download_forecast(zoltar_connection, forecast_url)
length(forecast_data$predictions)
#> [1] 29