Advanced DBI Usage

James Wondrasek, Kirill Müller

17/03/2020

Who this tutorial is for

This tutorial is for you if you need to use a richer set of SQL features such as data manipulation queries, parameterized queries and queries performed using SQL’s transaction features. See vignette("DBI", package = "DBI") for a more basic tutorial covering connecting to DBMS and executing simple queries.

How to run more complex queries using DBI

dbGetQuery() works by calling a number of functions behind the scenes. If you need more control you can manually build your own query, retrieve results at your selected rate, and release the resources involved by calling the same functions.

These functions are:

library(DBI)

con <- dbConnect(
  RMariaDB::MariaDB(),
  host = "relational.fit.cvut.cz",
  port = 3306,
  username = "guest",
  password = "relational",
  dbname = "sakila"
)

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = 'G'")
df <- dbFetch(res, n = 3)
dbClearResult(res)

head(df, 3)

How to read part of a table from a database

If your dataset is large you may want to fetch a limited number of rows at a time. As demonstrated below, this can be accomplished by using a while loop where the function dbHasCompleted() is used to check for ongoing rows, and dbFetch() is used with the n = X argument, specifying how many rows to return on each iteration. Again, we call dbClearResult() at the end to release resources.

res <- dbSendQuery(con, "SELECT * FROM film")
while (!dbHasCompleted(res)) {
  chunk <- dbFetch(res, n = 300)
  print(nrow(chunk))
}
dbClearResult(res)

How to use parameters (safely) in SQL queries

dbSendQuery() can be used with parameterized SQL queries. DBI supports two ways to avoid SQL injection attacks from user-supplied parameters: quoting and parameterized queries.

Quoting

Quoting of parameter values is performed using the function dbQuoteLiteral(), which supports many R data types, including date and time.1

In cases where users may be supplying table or column names to use in the query for data retrieval, those names or identifiers must also be escaped. As there may be DBMS-specific rules for escaping these identifiers, DBI provides the function dbQuoteIdentifier() to generate a safe string representation.

safe_id <- dbQuoteIdentifier(con, "rating")
safe_param <- dbQuoteLiteral(con, "G")

query <- paste0("SELECT title, ", safe_id, " FROM film WHERE ", safe_id, " = ", safe_param)
query

res <- dbSendQuery(con, query)
dbFetch(res)
dbClearResult(res)

The same result can be had by using glue::glue_sql(). It performs the same safe quoting on any variable or R statement appearing between braces within the query string.

id <- "rating"
param <- "G"
query <- glue::glue_sql("SELECT title, {`id`} FROM film WHERE {`id`} = {param}", .con = con)

df <- dbGetQuery(con, query)
head(df, 3)

Parameterized queries

Rather than performing the parameter substitution ourselves, we can push it to the DBMS by including placeholders in the query. Different DBMS use different placeholder schemes, DBI passes through the SQL expression unchanged.

MariaDB uses a question mark (?) as placeholder and expects an unnamed list of parameter values. Other DBMS may use named parameters. We recommend consulting the documentation for the DBMS you are using. As an example, a web search for “mariadb parameterized queries” leads to the documentation for the PREPARE statement which mentions:

Within the statement, “?” characters can be used as parameter markers to indicate where data values are to be bound to the query later when you execute it.

Currently there is no list of which placeholder scheme a particular DBMS supports.

Placeholders only work for literal values. Other parts of the query, e.g. table or column identifiers, still need to be quoted with dbQuoteIdentifier().

For a single set of parameters, the params argument to dbSendQuery() or dbGetQuery() can be used. It takes a list and its members are substituted in order for the placeholders within the query.

params <- list("G")
safe_id <- dbQuoteIdentifier(con, "rating")

query <- paste0("SELECT * FROM film WHERE ", safe_id, " = ?")
query

res <- dbSendQuery(con, query, params = params)
dbFetch(res, n = 3)
dbClearResult(res)

Below is an example query using multiple placeholders with the MariaDB driver. The placeholders are supplied as a list of values ordered to match the position of the placeholders in the query.

q_params <- list("G", 90)
query <- "SELECT title, rating, length FROM film WHERE rating = ? AND length >= ?"

res <- dbSendQuery(con, query, params = q_params)
dbFetch(res, n = 3)
dbClearResult(res)

When you wish to perform the same query with different sets of parameter values, dbBind() is used. There are two ways to use dbBind(). Firstly, it can be used multiple times with same query.

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = ?")
dbBind(res, list("G"))
dbFetch(res, n = 3)
dbBind(res, list("PG"))
dbFetch(res, n = 3)
dbClearResult(res)

Secondly, dbBind() can be used to execute the same statement with multiple values at once.

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = ?")
dbBind(res, list(c("G", "PG")))
dbFetch(res, n = 3)
dbClearResult(res)

Use a list of vectors if your query has multiple parameters:

q_params <- list(c("G", "PG"), c(90, 120))
query <- "SELECT title, rating, length FROM film WHERE rating = ? AND length >= ?"

res <- dbSendQuery(con, query, params = q_params)
dbFetch(res, n = 3)
dbClearResult(res)

Always disconnect from the database when done.

dbDisconnect(con)

SQL data manipulation - UPDATE, DELETE and friends

For SQL queries that affect the underlying database, such as UPDATE, DELETE, INSERT INTO, and DROP TABLE, DBI provides two functions. dbExecute() passes the SQL statement to the DBMS for execution and returns the number of rows affected. dbSendStatement() performs in the same manner, but returns a result object. Call dbGetRowsAffected() with the result object to get the count of the affected rows. You then need to call dbClearResult() with the result object afterwards to release resources.

In actuality, dbExecute() is a convenience function that calls dbSendStatement(), dbGetRowsAffected(), and dbClearResult(). You can use these functions if you need more control over the query process.

The subsequent examples use an in-memory SQL database provided by RSQLite::SQLite(), because the remote database used in above examples does not allow writing.

library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cars", head(cars, 3))

dbExecute(
  con,
  "INSERT INTO cars (speed, dist) VALUES (1, 1), (2, 2), (3, 3)"
)

rs <- dbSendStatement(
  con,
  "INSERT INTO cars (speed, dist) VALUES (4, 4), (5, 5), (6, 6)"
)
dbGetRowsAffected(rs)
dbClearResult(rs)

dbReadTable(con, "cars")

Do not forget to disconnect from the database at the end.

dbDisconnect(con)

SQL transactions with DBI

DBI allows you to group multiple queries into a single atomic transaction. Transactions are initiated with dbBegin() and either made persistent with dbCommit() or undone with dbRollback(). The example below updates two tables and ensures that either both tables are updated, or no changes are persisted to the database and an error is thrown.

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cash", data.frame(amount = 100))
dbWriteTable(con, "account", data.frame(amount = 2000))

withdraw <- function(amount) {
  # All operations must be carried out as logical unit:
  dbExecute(con, "UPDATE cash SET amount = amount + ?", list(amount))
  dbExecute(con, "UPDATE account SET amount = amount - ?", list(amount))
}

withdraw_transacted <- function(amount) {
  # Ensure atomicity
  dbBegin(con)

  # Perform operation
  withdraw(amount)

  # Persist results
  dbCommit(con)
}

withdraw_transacted(300)

After withdrawing 300 credits, the cash is increased and the account is decreased by this amount. The transaction ensures that either both operations succeed, or no change occurs.

dbReadTable(con, "cash")
dbReadTable(con, "account")

We can roll back changes manually if necessary. Do not forget to call dbRollback() in case of error, otherwise the transaction remains open indefinitely.

withdraw_if_funds <- function(amount) {
  dbBegin(con)
  withdraw(amount)
  # Rolling back after detecting negative value on account:
  if (dbReadTable(con, "account")$amount >= 0) {
    dbCommit(con)
    TRUE
  } else {
    message("Insufficient funds")
    dbRollback(con)
    FALSE
  }
}

withdraw_if_funds(5000)
dbReadTable(con, "cash")
dbReadTable(con, "account")

dbWithTransaction() simplifies using transactions. Pass it a connection and the code you want to run as a transaction. It will execute the code and call dbCommit() on success and call dbRollback() if an error is thrown.

withdraw_safely <- function(amount) {
  dbWithTransaction(con, {
    withdraw(amount)
    if (dbReadTable(con, "account")$amount < 0) {
      stop("Error: insufficient funds", call. = FALSE)
    }
  })
}

withdraw_safely(5000)
dbReadTable(con, "cash")
dbReadTable(con, "account")

As usual, do not forget to disconnect from the database when done.

dbDisconnect(con)

Conclusion

That concludes the major features of DBI. For more details on the library functions covered in this tutorial and the vignette("DBI", package = "DBI") introductory tutorial see the DBI specification at vignette("spec", package = "DBI"). If you are after a data manipulation library that works at a higher level of abstraction, check out dplyr. It is a grammar of data manipulation that can work with local dataframes and remote databases and uses DBI under the hood.


  1. An older method, dbQuoteString(), was used to quote string values only. The dbQuoteLiteral() method forwards to dbQuoteString() for character vectors. Users do not need to distinguish between these two cases.↩︎