Once we developed a star database in R, we would like to exploit it directly in R to develop multidimensional queries, but that is part of a future project. Currently we may be interested in deploying it in a RDBMS (Relational Database Management Systems) to use it with OLAP (On-Line Analytical Processing) query tools.
The deployment process consists of exporting the tables that make up the star databases to the RDBMS and also keeping them updated.
The vignette titled Incremental refresh of star databases,
vignette("v40-refresh")
, describes how star databases can
be periodically updated based on the new data obtained. These updates
are transmitted to the RDBMS automatically using the procedure described
in this document.
This document shows by means of an example the possibilities offered by the package in this context. First, the star database and the deployment processes are presented. The next section shows how the incremental refresh is applied and the result obtained in the relational databases where the deployments have been carried out. Finally, it finish with the conclusions.
This section shows the starting star databases and their deployment process.
The starting star databases is the content in the variable
mrs_db
, obtained in the vignette titled Obtaining and
transforming flat tables,
vignette("v05-flat-table-op")
. It contains the
constellation, formed by two star databases. Next we get their
names.
The code to generate the constellation from the initial data is available in the vignette. Below is a graphic representation of the tables that make it up.
To deploy, we need a file to store the star database. In a real case, it would be a file under our control in our folder tree: we only need a file name. For the vignette, let’s get a temporary one.
We are going to carry out the deployment on SQLite. We will also need a file to store the database. As before, if we use this RDBMS, we would use a file under our control. For the vignette we will use a temporary one.
The deployment functions must be able to access the database that we
use. To give them access to the database, we provide a connection
function (it must return an object of class DBIConnection
)
and a disconnection function (it must accept a parameter of class
DBIConnection
).
mrs_sqlite_connect <- function() {
DBI::dbConnect(RSQLite::SQLite(),
dbname = mrs_sqlite_file)
}
mrs_sqlite_disconnect <- function(con) {
DBI::dbDisconnect(con)
}
If the disconnection function is like the previous one, we could avoid indicating it because it is the default one.
With these parameters, we deploy with the deploy
function, as shown below.
mrs_db <- mrs_db |>
deploy(
name = "mrs_sqlite",
connect = mrs_sqlite_connect,
disconnect = mrs_sqlite_disconnect,
file = mrs_rdb_file
)
As a result, fact and dimension tables are explored to the database.
The star_database
object is stored (in the file indicated)
so that corresponds to the version stored in the relational database.
Additionally, from this moment on, the star database and the relational
database are linked: refresh operations to the star databases will
automatically be propagated to the relational database.
Next, we are going to show the content of the star databases and the associated relational database.
For facts and dimensions the number of instances in the star databases is shown.
l_db <- mrs_db |>
as_tibble_list()
names <- sort(names(l_db))
for (name in names){
cat(sprintf("name: %s, %d rows\n", name, nrow(l_db[[name]])))
}
#> name: mrs_age, 18228 rows
#> name: mrs_cause, 3677 rows
#> name: when, 2076 rows
#> name: where, 122 rows
#> name: who, 5 rows
We get the same information for the relational database.
mrs_con <- mrs_sqlite_connect()
tables <- DBI::dbListTables(mrs_con)
for (t in tables) {
res <- DBI::dbGetQuery(mrs_con, sprintf('SELECT COUNT(*) FROM `%s`', t))
cat(sprintf("name: %s, %d rows\n", t, res[[1]]))
}
#> name: mrs_age, 18228 rows
#> name: mrs_cause, 3677 rows
#> name: when, 2076 rows
#> name: where, 122 rows
#> name: who, 5 rows
mrs_sqlite_disconnect(mrs_con)
We can see that the tables and their number of instances are the same in both cases.
We can perform more than one deployment associated with a star database. Next, we get a second temporary file for the new SQLite relational database.
We need a new connection function to use the new file. We define it below.
mrs_sqlite_connect_2 <- function() {
DBI::dbConnect(RSQLite::SQLite(),
dbname = mrs_sqlite_file_2)
}
In this case we are going to use the default disconnection function and, since a previous deployment has already been carried out, we do not need to indicate any file to store the star databases because it is already being stored in one. The call to the deployment function is as shown below.
We can consult the current deployments using the following function.
If necessary, we can also cancel a deployment using the
cancel_deployment()
function: The database is not affected,
but it will not be updated with subsequent updates.
This section shows how incremental refresh updates are transferred from the star databases to the relational databases included in the deployments carried out.
To perform the incremental update, we take the data and process
presented in vignette Incremental refresh of star databases,
vignette("v40-refresh")
.
The update data is stored in the form of a flat table in the
package’s mrs_ft_new
variable. Below are the first records
of the table. We access the table using the get_table()
function for the object of the flat_table
class.
ft <- mrs_ft_new |>
get_table()
ft
#> # A tibble: 375 × 13
#> Year WEEK `Week Ending Date` REGION State City Pneumonia and Influe…¹
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1962 48 12/01/1962 2 NY Buffalo 5
#> 2 1963 3 01/19/1963 4 IA Des Moines 2
#> 3 1963 6 02/09/1963 8 CO Pueblo 0
#> 4 1963 7 02/16/1963 7 TX El Paso 0
#> 5 1963 25 06/22/1963 1 MA Springfie… 5
#> 6 1964 10 03/07/1964 1 MA Cambridge 3
#> 7 1964 12 03/21/1964 8 CO Colorado … 1
#> 8 1964 20 05/16/1964 3 IL Chicago 34
#> 9 1964 20 05/16/1964 8 CO Pueblo 0
#> 10 1964 27 07/04/1964 3 OH Akron 0
#> # ℹ 365 more rows
#> # ℹ abbreviated name: ¹`Pneumonia and Influenza Deaths`
#> # ℹ 6 more variables: `All Deaths` <chr>, `<1 year (all cause deaths)` <chr>,
#> # `1-24 years (all cause deaths)` <chr>, `25-44 years` <chr>,
#> # `45-64 years (all cause deaths)` <chr>,
#> # `65+ years (all cause deaths)` <chr>
We apply the modification process carried out on the star databases
to the data in the flat table. This process is stored in the
star_database
object itself so we do not have to search for
the applied functions.
mrs_db_age_refresh <- mrs_ft_new |>
update_according_to(mrs_db, star = "mrs_age")
mrs_db_cause_refresh <- mrs_ft_new |>
update_according_to(mrs_db, star = "mrs_cause")
The result is that we replicate the structure of each of the components of the constellation on the new data. With this structure we can now carry out the refresh process itself for each of the star databases.
mrs_db <- mrs_db |>
incremental_refresh(mrs_db_age_refresh) |>
incremental_refresh(mrs_db_cause_refresh, existing_instances = "group")
In each case, using the existing_instances
parameter, we
can decide what to do with the instances that appear in the update and
that were already included in the star database.
To check the result obtained, we are going to obtain again the name of the tables and the number of instances for both the star databases and each of the deployments.
mrs_con <- mrs_sqlite_connect()
tables <- DBI::dbListTables(mrs_con)
for (t in tables) {
res <- DBI::dbGetQuery(mrs_con, sprintf('SELECT COUNT(*) FROM `%s`', t))
cat(sprintf("name: %s, %d rows\n", t, res[[1]]))
}
#> name: mrs_age, 18228 rows
#> name: mrs_cause, 3677 rows
#> name: when, 2076 rows
#> name: where, 122 rows
#> name: who, 5 rows
mrs_sqlite_disconnect(mrs_con)
mrs_con_2 <- mrs_sqlite_connect_2()
tables <- DBI::dbListTables(mrs_con_2)
for (t in tables) {
res <- DBI::dbGetQuery(mrs_con_2, sprintf('SELECT COUNT(*) FROM `%s`', t))
cat(sprintf("name: %s, %d rows\n", t, res[[1]]))
}
#> name: mrs_age, 18228 rows
#> name: mrs_cause, 3677 rows
#> name: when, 2076 rows
#> name: where, 122 rows
#> name: who, 5 rows
mrs_sqlite_disconnect(mrs_con_2)
We can observe that in all three cases the number of instances is the same and has changed with respect to the situation prior to carrying out the refresh operations.
In addition to updating the deployments in the relational databases,
the copy of the star databases is automatically kept updated in its
file. We can load it into a variable using the
load_star_database()
function and check that its content is
identical to that of the original star database.
If the file has extension rds, the readRDS()
function could also be used directly.
The star_database
object contained in the new variable
is fully operational.
This document shows the functions supporting the deployment in
relational databases and the automatic incremental refresh of these
offered by the rolap
package.
If we need to perform OLAP analysis using tools that can obtain their data from relational databases, this functionality allows R developers to perform transformations in R in an even more productive way instead of having to work with other tools.