Walking Fingerprinting

In a large epidemiological study

Lily Koffman

4/10/25

Follow along

https://shorturl.at/FdHXC

Introduction

  • 4th year PhD candidate in biostatistics at Johns Hopkins
  • Advisor: Ciprian Cranineanu
  • Wearable and Implantable Technology (WIT) research group
  • Interests: accelerometry, functional data, walking

Background and motivation

Problem description

What do I mean “walking fingerprinting”?

Problem description

What do I mean “walking fingerprinting”?

Problem description

What do I mean “walking fingerprinting”?

Applications

Backing up: accelerometry

  • Accelerometry: collected from a wearable device

  • Between 15 and 100 observations per second in 3 dimensions

  • \(g\) units = 9.81 \(m/s^2\)

Accelerometry data

Accelerometry data

Accelerometry data

Outline

  • Transform accelerometry time series into scalar predictors
    • Compute acceleration, lag acceleration joint distribution
    • Represent joint distribution as 2D image
    • Partition image into cells: number of points in each cell is scalar summary
  • Use scalar summaries in one vs. rest classification
  • Functional regression approach
  • Applications to datasets
  • Results and next steps

Transform accelerometry into scalar predictors

Segment data into 1-s chunks

Zoom in on one second

Zoom in on one second

Plot acceleration, lag acceleration

Plot acceleration, lag acceleration

Transform to 2D grid

Transform acceration to 2D grid

Transform acceration to 2D grid

Transform to 2D grid

Derive predictors from grid

Derive predictors from grid

Derive predictors from grid

Derive predictors from grid

Derive predictors from the grid

Derive predictors from the grid

Derive predictors: repeat for all seconds

Derive predictors: repeat for all subjects

Summary

Fit models

Scalar predictors and one vs. rest classification

Scalar predictors and one vs. rest classification

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

Aside: functional regression approach

\[\text{logit}(p_{ij}^{i_0}) =\beta_0^{i_0} + \int_{u=1}^S\int_{s=u}^SF_{i_0}\{ v_{ij}(s), v_{ij}(s-u), u\}dsdu \] \(u = 1, \dots, S = 100\) (number of observations per second)

\(v_{ij}(s)\) = acceleration at centisecond \(s\) for subject \(i\) in second \(j\) \(F(\cdot, \cdot, \cdot)\): trivariate smooth function

Both methods work!

Both methods work!

Both methods work!

“Fingerprints” distinguish individuals

But…

  • Small sample size (\(n \in \{30, 153\}\))
  • We know the participants are walking
  • Young and healthy people
  • Can we adapt method to work in a larger, more diverse sample of free living data?

Application to NHANES

NHANES

  • Biannual survey of ~5,000 Americans
  • 2011-2014: wrist-worn accelerometers included in protocol
  • 7 full days of free-living data from a nationally representative sample of Americans n > 15,000
  • \(>10\) TB of raw data

Methods

  • Use ADEPT (ADaptive Empirical Pattern Transformation)1 to identify seconds with walking
  • Filter to walking bouts of at least 10 seconds in length
  • Construct fingerprints from filtered bouts
  • Train + test models on varying size subsets

Finding walking

Raw data

ADEPT templates

ADEPT walking identification: example

Check some results

Per-subject walking time

Define walking bout: \(\geq\) 10s where at least every other second has steps

Train/test partitioning & model fitting

Train/test paradigms

  • Random: 3 minutes of walking randomly sampled from all seconds. 75% used for training, 25% used for testing. \(n = 13{,}367\) \((85\%)\)
  • Temporal: 2 min 15 seconds of walking from one day used for training, 45 seconds from a later day used for testing. \(n = 10{,}770\) \((69\%)\)

Model fitting

  • Variable screening used to remove near-zero variance predictors for all models with tidymodels::step_nzv()
  • Models fit: logistic regression, lasso, random forest, extreme gradient boosting, linear and nonlinear scalar on function regression
  • First fit on subsets of \(n=100\); best models fit on larger subsets of \(n=500, 1000, 5000, 10000, N\)
  • Evaluate rank-1, rank-5, rank-1%, rank-5% accuracy

Results

Larger subsets: logistic regression

Oversampling

We can oversample the predicted subject to be a certain percent of the training data and see if this improves the model (imbalanced class)

Longer training

Increase the amount of time observed for each subject to 6 minutes per person? Intuition is this should improve model performance

Fingerprints

Next steps

Regress fingerprint on outcomes

  • NHANES data rich with information
  • Age, sex, diseases, mortality, etc.
  • Can we associate step or walking patterns with comoborbidities / demographics?

Preliminary analyses

Calculate for each subject the proportion of time spent in each grid cell and perform separate regressions for each grid cell:

\[\text{time in cell}_i = \beta_0 + \beta_1\text{mortality at 5 years}_i \]

We do this for each cell, then plot the results. Greyed out cells were not significant after Bonferroni correction.

Interpret red cells as: change in 5-year mortality associated with 1% increase in time spent in cell \(c\). Next step: image on scalar regression

Thank you!