## Kazerooni 2016

### Excerpt from published methods

Patients were excluded from this study if they were missing weight values from any of the three measuring points of the study (pre-, mid-, and post-weight) … Pre-weight was defined as the most recent weight within 30 days before starting topiramate therapy. Mid-study period was defined as the earliest weight taken between 3 and 6 months post initiation of therapy. Post-study period was defined as the earliest weight taken between 6 and 12 months post initiation of therapy.

### Translation in pseudocode

DEFINE time t_ij for person i IN 1:I, j weights In 1:J {pre-trt, mid-trt, post-trt}
FOR i IN 1:I
FOR j IN 1:J
weight_i1 := weight @ t_0 (treatment start) - 30 days
weight_i2 := {t_0 + 90 days <= weight <= t_0 + 180 days}
weight_i3 := {t_0 + 180 days < weight <= t_0 + 365 days}
END FOR
IF (weight_ij IS NULL) (weight measure missing at jth time point)
EXCLUDE person i
END FOR

### Algorithm in R Code

#' @title Kazerooni et al. 2016 Weight Cleaning Algorithm
#' @param DF object of class data.frame, containing id and measures
#' @param id string corresponding to the name of the column of patient identifiers in DF
#' @param measures string corresponding to the name of the column of measures in DF, e.g., numeric weight data if using to clean weight data.
#' @param tmeasures string corresponding to the name of the column of measure dates and/or times in DF
#' @param startPoint string corresponding to the name of the column in DF holding the time at which subsequent measurement dates will be assessed, should be the same for each person. Eg., if t = 0 (t[1]) corresponds to an index visit held by the variable 'VisitDate', then startPoint should be set to 'VisitDate'
#' @param t numeric vector of time points to collect measurements, eg. c(0, 182.5, 365) for measure collection at t = 0, t = 180 (6 months from t = 0), and t = 365 (1 year from t = 0). Default is c(0, 182.5, 365) according to Kazerooni et al. 2016
#' @param windows numeric list of two vectors of measurement collection windows to use around each time point in t. E.g. Kazerooni et al. 2016 use c(30, 0, 0) for the lower bound and c(0, 0, 185) for the upper bound at t of c(0, 90, 180), implying that the closest measurement to t[1] (=0) will be within the window [-30, 0], then the closest to t[2] (=90) will be within [90, 180], t[3] (=180) within (180, 365]
Kazerooni2016.f <- function(DF,
id,
measures,
tmeasures,
startPoint,
t = c(0, 90, 180),
windows = list(LB = c(30, 0, 0),
UB = c(0, 90, 185))) {

if (!require(dplyr)) install.packages("dplyr")
if (!require(rlang)) install.packages("rlang")

tryCatch(
if (class(DF[[tmeasures]])[1] != class(DF[[startPoint]])[1]) {
stop(
print(
paste0("date type of tmeasures (",
class(DF[[tmeasures]]),
") != date type of startPoint (",
class(DF[[startPoint]])[1],
")"
)
)
)
}
)

tryCatch(
if (class(t) != "numeric") {
stop(
print("t parameter must be a numeric vector")
)
}
)

tryCatch(
if (!is.list(windows)) {
stop(
print("windows must be placed into a list object")
)
}
)

# compute difference in time between t0 and all t_j
id         <- rlang::sym(id)
tmeasures  <- rlang::sym(tmeasures)
startPoint <- rlang::sym(startPoint)

DF <- DF %>%
mutate(
time = as.numeric(
difftime(
!!tmeasures, !!startPoint,
tz = "utc", units = "days"
)
)
)

# loop through each time point in t, place into list
meas_tn <- vector("list", length(t)) # set empty list
for (i in 1:length(t)) {
meas_tn[[i]] <- DF %>%
filter(time >= t[i] - windows$LB[i] & time <= t[i] + windows$UB[i]) %>%
group_by(!!id) %>%
arrange(abs(time - t[i])) %>%
slice(1)
}

# count number of time points available for each subject i
do.call(rbind, meas_tn) %>%
arrange(!!id, !!tmeasures) %>%
group_by(!!id) %>%
filter(max(row_number()) >= 3) %>% # must have all 3 time points
ungroup()
}

### Algorithm in SAS Code

#TODO

### Example in R

Kazerooni2016.df <- Kazerooni2016.f(DF,
id = "PatientICN",
measures = "Weight",
tmeasures = "WeightDateTime",
startPoint = "VisitDateTime")

Distribution of Weight Measurements between Raw and Algorithm-Processed Values


Descriptive statistics by group
group: Input
vars       n   mean   sd median trimmed   mad min    max  range skew
X1    1 1175995 207.82 48.6  202.3  204.62 44.18   0 1486.2 1486.2 0.98
kurtosis   se
X1      5.6 0.04
------------------------------------------------------------
group: Output
vars     n mean    sd median trimmed   mad min    max  range skew kurtosis
X1    1 71961  209 47.95    204  206.02 44.33   0 1233.7 1233.7 0.88     4.27
se
X1 0.18

We won’t show a vignette for Kazerooni & Lim 2016, as it just excludes people, retaining only those with a certain number of measurements.

Left boxplot is raw data from 2016, PCP visit subjects while the right boxplot describes the output from running Kazerooni2016.f()