There are some rules of thumb that I follow when using the
TreatmentPatterns
package. These rules tend to work well in
most situations, across databases and datasets.
minPostCombinationWindow <= minEraDuration
.combinationWindow >= minEraDuration
.When creating cohorts, it is important to keep in mind that the subjects will be dived across pathways. Lets assume we have 10000 subjects in a fictitious cohort. Let’s also assume we have 5 event cohorts.
The total number of potential pathways, assuming only mono therapies equals \(events^{evnets}\), assuming we do not allow for any re-occuring treatments it would still equal to \(!5\).
Assuming our 5 event cohorts this would equal to:
5^5
## [1] 3125
factorial(5)
## [1] 120
The minEraDuration
, combinationWindow
, and
minPostCombinationWindow
have significant effects on how
the treatment pathways are built. Conciser the following example:
library(dplyr)
cohort_table <- tribble(
~cohort_definition_id, ~subject_id, ~cohort_start_date, ~cohort_end_date,
1, 1, as.Date("2020-01-01"), as.Date("2021-01-01"),
2, 1, as.Date("2020-01-01"), as.Date("2020-01-20"),
3, 1, as.Date("2020-01-22"), as.Date("2020-02-28"),
4, 1, as.Date("2020-02-20"), as.Date("2020-03-3")
)
cohort_table
## # A tibble: 4 × 4
## cohort_definition_id subject_id cohort_start_date cohort_end_date
## <dbl> <dbl> <date> <date>
## 1 1 1 2020-01-01 2021-01-01
## 2 2 1 2020-01-01 2020-01-20
## 3 3 1 2020-01-22 2020-02-28
## 4 4 1 2020-02-20 2020-03-03
Assume that the target cohort is cohort_definition_id: 1, the rest are event cohorts.
cohort_table <- cohort_table %>%
mutate(duration = as.numeric(cohort_end_date - cohort_start_date))
cohort_table
## # A tibble: 4 × 5
## cohort_definition_id subject_id cohort_start_date cohort_end_date duration
## <dbl> <dbl> <date> <date> <dbl>
## 1 1 1 2020-01-01 2021-01-01 366
## 2 2 1 2020-01-01 2020-01-20 19
## 3 3 1 2020-01-22 2020-02-28 37
## 4 4 1 2020-02-20 2020-03-03 12
As you can see, the duration of the treatments are: 19, 37 and 12 days. Also cohort 3 overlaps with treatment 4 for 8 days.
We can compute the overlap as follows:
cohort_table <- cohort_table %>%
# Filter out target cohort
filter(cohort_definition_id != 1) %>%
mutate(overlap = case_when(
# If the result of the next cohort_end_date is NA, set 0
is.na(lead(cohort_end_date)) ~ 0,
# Compute duration of cohort_end_date - next cohort_start_date
# 2020-02-28 - 2020-02-20 = -8
.default = as.numeric(cohort_end_date - lead(cohort_start_date))))
cohort_table
## # A tibble: 3 × 6
## cohort_definition_id subject_id cohort_start_date cohort_end_date duration
## <dbl> <dbl> <date> <date> <dbl>
## 1 2 1 2020-01-01 2020-01-20 19
## 2 3 1 2020-01-22 2020-02-28 37
## 3 4 1 2020-02-20 2020-03-03 12
## # ℹ 1 more variable: overlap <dbl>
We see that the overlap between treatment 2 and 3 is -2
,
so rather than an overlap there is a gap between these treatments.
Between treatment 3 and 4 there is an 8 day overlap. There is no next
treatment after treatment 4, so the overlap is 0, let’s assume our
minEraDuration = 5
.
We can draw it out like so:
2: -------------------
3: -------------------------------------
4: ------------
If we set our minCombinationWindow = 5
, the combination
would be computed for cohort 3 and 4. This would leave us with the
following treatments:
2: -------------------
3: -----------------------------
3+4: --------
4: ----
Treatment 3 now lasts 11 days; Treatment 4 lasts 4 days; and
combination treatment 3+4 lasts 8 days. If our
minPostCombinationDuration
is not set properly, we can
filter out either too many, or too little treatments.
Assuming we would set minPostCombinationDuration = 10
,
we would lose treatment 4 and combination treatment 3+4. This would
leave us with the following paths:
2: -------------------
3: -----------------------------
Pathway: 2-3
As a rule of thumb the setting the
minPostCombinationDuration <= minEraDuration
seems to
yield reasonable results. This would leave us with the following paths
minPostCombinationDuration = 5
:
2: -------------------
3: -----------------------------
3+4: --------
Pathway: 2-3-3+4