Excel has a lot of buy-in. This is generally pretty unfortunate, as there are much better tools for data analysis (i.e. ). Excel is also not platform-agnostic, so there are difficulties generating Excel reports on non-Windows systems. However, Java is, and the excellent Apache POI project provides a nice interface to Excel documents in a platform-agnostic manner. Further, integrates nicely with this project through the xlsx
package to provide a suite of tools to be used in data science and report generation on any operating system.
The first step of any report generation from scratch is data preparation. Suffice it to say that R has many tools to expedite that process. For instance, the tidyverse
provides several packages for getting data into a nice, tidy format for modeling and visualization. We will presume that you have your data structured the way that you want and focus on getting that presentation into Excel.
The CellStyle
class in the xlsx
package makes possible many of the desirable traits of a good Excel report - Borders, DataFormats, Alignment, Font, and Fill. While a bit verbose to type at times, the +
operator is implemented to make building these styles more natural. For instance, we might define a theme which has a standard selection of fonts, colors, and formats.
## fonts require a workbook
createFonts <- function(wb) {
list(
data = Font(wb, heightInPoints = 11, name='Arial')
, title = Font(wb, heightInPoints = 16, name='Arial', isBold = TRUE)
, subtitle = Font(wb, heightInPoints = 13, name='Arial', isBold = FALSE, isItalic = TRUE)
)
}
## alignment
alignLeft <- Alignment(horizontal='ALIGN_LEFT', vertical='VERTICAL_CENTER', wrapText = TRUE)
alignCenter <- Alignment(horizontal='ALIGN_CENTER', vertical='VERTICAL_CENTER', wrapText=TRUE)
## data formats
dataFormatDate <- DataFormat('m/d/yyyy')
dataFormatNumberD <- DataFormat('0.0')
## fill
fillPrimary <- Fill('#cc0000','#cc0000','SOLID_FOREGROUND')
fillSecondary <- Fill('#ff6666','#ff6666','SOLID_FOREGROUND')
Once we have defined such standard cell-styles, it is straightforward to use them with the additive CellStyle framework.
## The dataset
numbercol <- 9
mydata <- as.data.frame(lapply(1:numbercol,function(x){runif(15, 0,200)}))
mydata <- setNames(mydata,paste0('col',1:numbercol))
## Build report
wb <- createWorkbook()
sh <- createSheet(wb, 'Report')
f <- createFonts(wb)
headerrow <- createRow(sh, 1:2)
headercell <- createCell(headerrow, 1:ncol(mydata))
## title
addMergedRegion(sh,1,1,1,ncol(mydata))
lapply(headercell[1,],function(cell) {
setCellValue(cell, 'Title of Report')
setCellStyle(cell, CellStyle(wb) + f$title + alignCenter)
})
## subtitle
addMergedRegion(sh, 2,2, 1,ncol(mydata))
lapply(headercell[2,],function(cell) {
setCellValue(cell, 'A fantastic report about nothing')
setCellStyle(cell, CellStyle(wb) + f$subtitle + alignCenter )
})
## cell styles for data
cslist <- lapply(1:ncol(mydata), function(x){CellStyle(wb) + f$data + alignCenter + dataFormatNumberD})
cslist[1:2] <- lapply(cslist[1:2], function(x){x + alignLeft}) ## left align first two columns
## add data
workrow <- 4
addDataFrame(mydata, sh
, col.names=TRUE
, row.names = FALSE
, startRow = workrow
, startColumn = 1
, colStyle = setNames(cslist,1:numbercol)
, colnamesStyle = CellStyle(wb) + f$subtitle + alignCenter + fillPrimary
)
workrow <- workrow + nrow(mydata) + 1 ## + 1 for header
## add total row... sorta
## - (just the first row because I am lazy)
addDataFrame(mydata[1,], sh
, col.names=FALSE
, row.names=FALSE
, startRow = workrow
, startColumn = 1
, colStyle = setNames(lapply(cslist,function(x){x + fillSecondary}),1:numbercol)
)
saveWorkbook(wb, 'excel_report.xlsx')
Once you get used to some of the verbosity, the elegance of automatically creating nicely formatted Excel reports on a UNIX platform from R begins to shine. Further, with a bit of work implementing an R API, we get the benefit of a robust Java community debugging issues behind the scenes at Apache. If you would like to contribute on improving the R API and adding functionality, you can do so on github.
Bonus: you can even customize print formatting and a whole host of other things! There is more to come on that.