Using Functional Data Analysis
4/15/25
What do I mean “walking fingerprinting”?
What do I mean “walking fingerprinting”?
What do I mean “walking fingerprinting”?
Accelerometry: collected from a wearable device
Between 15 and 100 observations per second in 3 dimensions
\(g\) units = 9.81 \(m/s^2\)
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
Hat tip to Edward Gunning for the idea for these figures
\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \] \(u = 1, \dots, S = 100\) (number of observations per second)
\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\) \(F(\cdot, \cdot, \cdot)\): trivariate smooth function
Hat tip to Edward Gunning for the idea for these figures
Toy example
4 observations per second (data are observed every 1/25th second)
2 seconds
1 individual
data = \[\begin{bmatrix} v_1(1) & v_1(2) & v_1(3) & v_1(4) \\ v_2(1) & v_2(2) & v_2(3) & v_2(4) \\ \end{bmatrix} \]
acceleration matrix = \[\begin{bmatrix} v_1(2) & v_1(3) & v_1(4) & v_1(3) & v_1(4) & v_1(4) \\ v_2(2) & v_2(3) & v_2(4) & v_2(3) & v_2(4) & v_2(4) \\ \end{bmatrix} \]
lag acceleration matrix = \[\begin{bmatrix} v_1(1) & v_1(1) & v_1(1) & v_1(2) & v_1(2) & v_1(3) \\ v_2(1) & v_2(1) & v_2(1) & v_2(2) & v_2(2) & v_2(3) \\ \end{bmatrix} \]
lag matrix = \[\begin{bmatrix} 1 & 2 & 3 & 1 & 2 & 1\\ 1 & 2 & 3 & 1 & 2 & 1\\\end{bmatrix} \]
Define walking bout: \(\geq\) 10s where at least every other second has steps
tidymodels::step_nzv()
Linear scalar on function regression (SoFR) model: \[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_S \beta_1(s) X_{ij}(s)ds + \epsilon_{ij}\] Where \(S \in \{1, 2, \dots, 432\}\), \(X_{ij}(s)\) is number of points for subject \(i\) in second \(j\) in grid cell \(s\), \(\epsilon_{ij} \sim \mathcal{N}(0, \sigma^2)\)
Nonlinear scalar on function regression (SoFR) model: \[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_S f\big(\beta_1(s) X_{ij}(s)\big)ds + \epsilon_{ij}\]
# predictor matrix
pred_mat =
train_data %>%
select(starts_with("x")) %>%
as.matrix()
xdf =
train_data %>%
select(-starts_with("x")) %>%
mutate(pred_mat = df_mat)
if (linear) {
pfr_fit =
refund::pfr(
outcome ~ lf(pred_mat, argvals = seq(1, 432)),
family = binomial(link = "logit"),
method = "REML",
data = xdf
)
} else {
pfr_fit =
refund::pfr(
outcome ~ af(pred_mat, argvals = seq(1, 432)),
family = binomial(link = "logit"),
method = "REML",
data = xdf
)
}
We can oversample the predicted subject to be a certain percent of the training data and see if this improves the model (imbalanced class)
Increase the amount of time observed for each subject to 6 minutes per person? Intuition is this should improve model performance
Calculate for each subject the proportion of time spent in each grid cell and perform separate regressions for each grid cell:
\[\text{time in cell}_i = \beta_0 + \beta_1\text{mortality at 5 years}_i \]
We do this for each cell, then plot the results. Greyed out cells were not significant after Bonferroni correction.
Interpret red cells as: change in 5-year mortality associated with 1% increase in time spent in cell \(c\). Next step: image on scalar regression