Peter Meißner
2015-12-14
Base R’s once done choice of setting stringsAsFactors within
data.frame()
and as.data.frame()
to TRUE by
default is a design decision that makes sense (more efficient storage,
building statistical models with factors makes sense) on the one hand
and on the other hand is supposedly the most often complained about
piece of code in the R infrastructure. A search through the source code
of all CRAN packages in December 2015
https://github.com/search?utf8=%E2%9C%93&q=user%3Acran+stringsAsFactors&type=Code
resulted in 3,795 results for mentions of stringsAsFactors
and most of them simply set the value to FALSE
. The hellno
package provides an explicit solution to the problem without changing R
itself or having to mess around with options. It tries to solve this
problem by providing alternative data.frame()
and
as.data.frame()
functions that are in fact simple wrappers
around base R’s data.frame()
and
as.data.frame()
with stringAsFactors
option
set to HELLNO
(equals to FALSE
) by
default.
R’s default behaviour…
<- data.frame(a=letters[1:3])
df1 $a df1
## [1] a b c
## Levels: a b c
class(df1$a)
## [1] "factor"
R’s default behaviour after loading the package
library(hellno)
##
## Attaching package: 'hellno'
##
## Die folgenden Objekte sind maskiert von 'package:base':
##
## as.data.frame, data.frame
<- data.frame(a=letters[1:3])
df2 $a df2
## [1] "a" "b" "c"
class(df2$a)
## [1] "character"
While using the hellno package in interactive R mode is nice, in fact
it could have been achieved simply by doing something like this:
options("stringsAsFactors"=FALSE)
. The strength of hellno
is that it can be imported when writing packages and therefore providing
as.data.frame()
and data.frame()
with
stringsAsFactors
Option consistently set to
FALSE
. Once imported stringsAsFactors=FALSE
will be the default for all uses of data.frame()
and
as.data.frame()
within all package functions BUT NOT
OUTSIDE OF IT. Thus it provides a way to ease programming while also
ensuring that package users might still choose which flavor of
stringsAsFactors
they like best.
Let us see how this works following a little example. Again, let us start with loading hellno package:
library(hellno)
data.frame(a=letters[1:2])$a
## [1] "a" "b"
As shown before, character vector are not transformed to factor.
We unload hellno again to start clean.
unloadNamespace("hellno")
Now we install the hellnotest package from Github and load it. The
package uses hellno internally in two functions. While internal uses of
data.frame()
and as.data.frame()
will work
with stringsAsFactors=FALSE
as default this does not change
how things work everywhere else.
if( !("hellnotests" %in% installed.packages()) ){
::install_github("petermeissner/hellnotests")
devtools
}
library(hellnotests)
data.frame(a=letters[1:2])$a
## [1] a b
## Levels: a b
While all functions within the package use hellno’s alternative implementations:
hellno_df
## function ()
## {
## data.frame(a = letters[1:3])$a
## }
## <environment: namespace:hellnotests>
… and hence for them string conversion is no matter anymore:
hellno_df()
## [1] "a" "b" "c"
… and once again to bring the point home:
data.frame(a=letters[1:2])$a
## [1] a b
## Levels: a b
WRITING PACAKGES WITH HELLNO DOES NOT CHANGE OUTSIDE BEHAVIOR.