predict
. (#223)@call
slot of objects may
look slighly different (but should function identically). (#234)bibentry()
.Minor revision to address a failing test.
match_on()
, now more simply calculates
the contrast to enable more intuitive results. (Thanks Noah Greifer,
#220)dbind()
will now properly support binding more than 26
unique matrices when renaming is necessary; in fact it supports up to
18,278 uniquely renamed matrices.match_on()
using the argument
method = "rank_mahalanobis"
was accidentally returning the
squared distance rather than the distance. This has been fixed. To
recover results using the squared distance, square the results, e.g.:
match_on(..., method = "rank_mahalanobis")^2
. (Thanks
Noah Greifer #218)as.list.BlockedInfinitySparseMatrix()
to
split a single BlockedInfinitySparseMatrix
into a
list
of InfinitySparseMatrix
based upon the
separate blocks. (Called via as.list(b)
when b
is a BlockedInfinitySparseMatrix
.)dbind()
for binding several distance
matrices into a single BlockedInfinitySparseMatrix
. Valid
inputs include any distance convertible into an
InfinitySparseMatrix
, or
BlockedInfinitySparseMatrix
, or list
s of
these. (#65)License_is_FOSS
and
License_restricts_use
flags after 0.10.0 transition to an
open license.optmatch::strata
to be used in place of
survival::strata
. Loading survival and
masking strata
should not cause issues either.(Note: 0.10.1 and 0.10.2 were functionally equivalent releases
updated to address an issue with CRAN and the
License_is_FOSS
and License_restricts_use
flags.)
help(fullmatch)
for a discussion on those, and
the new argument to fullmatch()
,
solver =
.survey::svyglm()
(#194)survey::mad
and
survey::med
interfacewithin=
arguments to match_on()
,
or functions calling match_on()
such as
pairmatch()
or fullmatch()
, were sometimes
ignored (#181).fullmatch()
or pairmatch()
found it infeasible to create matches within an exact matching category,
under some circumstances all members of that category were being placed
into a single category labeled 1.NA
, or 2.NA
etc. Instead, all members of that category are now NA
(#203).match_on()
, scores()
to misinterpret propensity or other scores fitted with
survey::svyglm()
(#194).boxplot()
gains a method for svyglm
objects, e.g. propensity score models fitted with case weights via
survey::svyglm()
(#194).match_on.glm()
’s arguments has
changed slightly: to circumvent scale standardization when matching on a
propensity score or other index, you should now pass
standardization.scale = 1
, not
standardization.scale = NULL
(#194).summary.optmatch()
to fail b/c of
NAs in the treatment variable (#155).exclude
argument to match_on()
mirroring the exclude
argument for
caliper()
.Optmatch
objects now support an update()
function, update.Optmatch()
. (#54)Optmatch
objects can be combined via a c()
function, c.Optmatch()
. (#68)labelled
treatment vectors which
often arise when importing from Stata or SPSS. (#159)matchfailed()
.
(#175)summary.optmatch()
.if(vectorOfThings)
usage that will give an
error in upcoming R release.controls
times the number of treatments, it
now attempts to match in that stratum by leaving out some of the
treatment units. (#116)treatment_new = treatment == "T"
.data
argument is excluded from
fullmatch()
or pairmatch()
and
num_NA
> 0 entries in the treatment status vector are
NA, then the length of the vector produced by fullmatch()
or pairmatch()
won’t match the length of the treatment
status vector, having num_NA
fewer observations. Don’t
forget to pass a data
argument!min.controls
/mean.controls
/max.controls
directives would have been mistakenly applied to the wrong subclasses,
resulting in strange warnings and, potentially, spurious match failures
or unintended structural restrictions in some subclasses (#129).fullmatch()
to automatically fail. I.e. we’ve
restored the behaviour of the software prior to version 0.8. (#132)summary()
methods for
InfinitySparseMatrix
(summary.InfinitySparseMatrix()
),
BlockedInfinitySparseMatrix
,
(summary.BlockedInfinitySparseMatrix()
) and
DenseMatrix
(summary.DenseMatrix()
). I.e., you
can call summary()
on the result of a call to
match_on()
or caliper()
. The information this
returns may be useful for selecting caliper widths, and for managing
computational burdens with large matching problems.pairmatch()
, fullmatch()
or
match_on()
, then the factor “fac” will both serve as an
independent variable for the propensity model and an exact matching
variable (#101). See the examples on the help documentation for
fullmatch()
.pairmatch()
and fullmatch()
no longer
generate “matched.distances” attributes for their results. To get this
information, use matched.distances()
.fill.NAs()
directly to glm()
or similar. Use the traditional formula
and data
argument version. See help documentation for
fill.NAs()
for examples.boxplot()
method for fitted propensities ignoring varwidth
argument (#113); various minor issues affecting package development and
deployment (#110,…).stratumStructure()
.contr.match_on()
, a new default contrasts
function for making Mahalanobis and Euclidean distances. Previously we
used R defaults, which (a) generated different answers for the same
factor depending on the ordering of the levels and (b) led to different
distances for {0,1}-valued numeric variables and two level factors.
(#80)fullmatch()
with feasible
combinations of min.controls
,
mean.controls
/max.controls
and
max.controls
(#92)fullmatch()
or pairmatch()
to create distance
specifications directly.glm()
method for
match_on()
that caused observations with fixable NAs to be
dropped too often.distUnion()
allows combining arbitrary
distance specifications.antiExactMatch()
provides for matches that
may only occur between treated and control units with different
values on a factor variable. This is the opposite of
exactMatch()
, which ensures matches occur within factor
levels.data
argument in more cases when using
the summary()
method when the RItools
package is present.omit.fraction
argument when there are unmatched controls.minExactMatch()
function.optmatch_verbose_message
option to provide
additional warnings.fullmatch()
.caliper()
function that allows
returning values that fit the caliper instead of just indicators of
which entries fit the caliper width.match_on()
.Optmatch
objects now preserves (and
subsets) the subproblem attribute.Solver limits now depend on machine limits, not arbitrary constants defined by the optmatch maintainers. For large problems, users will see a warning, but the solver will attempt to solve.
fullmatch()
and pairmatch()
can now
take distance generating arguments directly, instead of having to first
call match_on()
. See the documentation for these two
functions for more details.
Infeasibility recovery in fullmatch()
. When passing
a combination of constraints (e.g. max.controls
) that would
make the matching infeasible, fullmatch()
will now attempt
to find a feasible match that respects those constraints, which will
likely result in omitting some controls units.
An additional argument to fullmatch()
,
mean.controls
, is an alternative to the previous
omit.fraction
. (Only one of the two arguments can be
presented.) The match will attempt to average mean.controls number of
controls per treatment.
Each Optmatch
object now carries with it the
constraints used to generate it (e.g. max.controls
) as well
as a hashed version of the distance it matched up, to help with some
debugging/error checking but avoiding having to carry the entire
distance matrix around.
Creating a distance matrix prior to matching is now optional.
fullmatch()
now accepts arguments from which
match_on()
would create a distance, and create the match
behind the scenes.
Performance enhancements for distance calculations.
Several new utility functions, including subdim()
,
optmatch_restrictions()
,
optmatch_same_distance()
,
num_eligible_matches()
. See their help documentation for
additional details.
Arithmetic operations between InfinitySparseMatrices and vectors are supported. The operation is carried out as column by vector steps.
scores()
function allows including model predictions
(such as propensity scores) in formulas directly (such as combining
multiple propensity scores). The scores()
function is
preferred to predict() as it makes several smart choices to avoid
dropping observations due to partial missingness and other useful
preparations for matching.
match_on()
is now a S3 generic function, which
solves several bugs using propensity models from other
packages.
summary()
method was giving overly pessimistic
warnings about failures.
fixed bug in how Optmatch
objects were
printing.
mdist()
is now deprecated, in favor of
match_on()
.full()
and pair()
are now aliases to
fullmatch()
and pairmatch()
All match_on()
methods take caliper
arguments (formerly just the numeric method and derived methods had this
argument).
boxplot methods for fitted propensity score methods
(glm()
and bigglm()
)
fill.NAs()
now takes contrasts.arg
argument to mimic model.matrix()
Several bug fixes in examples, documentation
The methods pscore.dist()
and
mahal.dist()
are now deprecated, with useful error messages
pointing users to replacements.
Significant performance improvements for sparse matching problems.
Functions umatched()
and matched()
were
backwards. Corrected.
More efficient data structure for sparse matching problems, those
with relatively few allowed (finite) distances between units. Sparse
problems often arise when calipers are employed. The new data structure
(InfinitySparseMatrix
) behaves like a simple matrix,
allowing cbind()
, rbind()
, and
subset()
operations, making it easier to work with the
older optmatch.dlist
data structure.
match_on()
: A series of methods to generate matching
problems using the new data structure when appropriate, or using a
standard matrix when the problem is dense. This function is being
deployed along side the mdist()
function to provide
complete backward compatibility. New development will focus on this
function for distance creation, and users are encouraged to use it right
away. One difference for mdist()
users is the
within
argument. This argument takes an existing distance
specification and limits the new comparisons to only those pairs that
have finite distances in the within
argument. See the
match_on()
, exactMatch()
, and
caliper()
documentation for more details.
exactMatch()
: A new function to create stratified
matching problems (in which cross strata matches are forbidden). Users
can specify the strata using either a factor vector or a convenient
formula interface. The results can be used in calls
match_on()
to limit distance calculations to only with-in
strata treatment-control pairs.
New data
argument to fullmatch()
and
pairmatch()
: This argument will set the order of the match
to that of the row.names
, names
, or contents
of the passed data.frame
or vector
. This
avoids potential bugs caused when the optmatch
objects were
in a different order than users’ data.
Test suite expanded and now uses the testthat library.
fill.NAs()
allows (optionally) filling in all
columns (previously, the first column was assumed to be an outcome or
treatment indicator and was not filled in).
New tools to find minimum feasible constraints: Large matching
problems could exceed the upper limit for a matching problem. The
functions minExactmatch()
and maxCaliper()
find the smallest interaction of potential factors for stratified
matchings or the largest (most generous) caliper, respectively, that
make the problem small enough to fit under the maximum problem size
limit. See the help pages for these functions for more
information.
1.NA
or similar). This avoids some obscure bugs when
feeding the results of fullmatch()
to other functions.FOR A DETAILED CHANGELOG, SEE https://github.com/markmfredrickson/optmatch
pairmatch()
has a new option,
remove.unmatchables
, that may be useful in conjunction with
caliper matching. With remove.unmatchables = TRUE
, prior to
matching any units with no counterparts within caliper distance are
removed. Pair matching can still fail, if for example for two distinct
treatment units only a single control, the same one, is available for
matching to them; but remove.unmatchables
eliminates one
simple and common reason for pair matching to fail.
Applying summary()
to an optmatch object now creates
a summary.optmatch
containing the summary information, in
addition to reporting it to the console (via a
summary.optmatch()
method for
print()
).
mdist.formula()
no longer requires an explicit data
argument. I.e., you can get away with a call like
mdist(Treat~X1+X2|S)
if the variables Treat
,
X1
, X2
and S
are available in the
environment you’re working from (or in one of its parent environments).
Previously you would have had to do
mdist(Treat~X1+X2|S, data=mydata)
. (The latter formulation
is still to be preferred, however, in part because with it
mdist()
gets to use data’s row names, whereas otherwise it
would have to make up row names.)
fill.NAs()
replaces missing observations
(ie. NA values) with minimally informative values (ie. the mean of
observed columns). fill.NAs()
handles functions in formulas
intelligently and provides missing indicators for each variable. See the
help documentation for more information and examples.mdist.function()
method now properly returns an
optmatch.dlist
object for use in
summary.optmatch()
, etc.
mdist.function()
maintains label on grouping
factor.
New mdist()
method to extract propensity scores from
models fitted using bigglm()
in package
biglm.
mdist()
’s formula method now understands grouping
factors indicated with a pipe (|
)
informative error message for mdist()
called on
numeric vectors
updated mdist()
documentation
There is a new generic function, mdist()
, for
creating matching distances. It accepts: fitted glm’s, which it uses to
extract propensity distances; formulas, which it uses to construct
squared Mahalanobis distances; and functions, with which a user can
construct his or her own type of distance. The function method is more
intuitive to work with than the older makedist()
function.
A new function, caliper()
, builds on the
mdist()
structure to provide a convenient way to add
calipers to a distance. In contrast to earlier ways of adding calipers,
caliper()
has an optional argument specify observations to
be excluded from the caliper requirement — this permits one to relax it
for just a few observations, for instance.
summary.optmatch()
now removes strata in which
matching failed (b/c the matching problem was found to be infeasible)
before summarizing. It also indicates when such strata are present, and
how many observations fall in them.
Demo has been updated to reflect changes as of version 0.4, 0.5, 0.6.
subsetting of objects of class Optmatch
now
preserves matched.distances attribute.
fixed bug in
maxControlsCap()
/minControlsCap()
whereby they
behaved unreliably on subclasses within which some subjects had no
permissible matches.
Removed unnecessary panic in fullmatch()
when it was
given a min.controls
argument with attributes other than
names (as when it is created by tapply()
).
fixed bug wherein summary.optmatch()
fails to
retrieve balance tests if given a propensity model that had function
calls in its formula.
Documentation pages for fullmatch()
,
pairmatch()
filled out a bit.
summary.optmatch()
completely revised. It now reports
information about the configuration of the matched sets and about
matched distances. In addition, if given a fitted propensity model as a
second argument it summarizes covariate balance.