Fingerprinting Walking using High Density Accelerometer Data

Lily Koffman

The Problem: Can We Identify an Individual From Their Walking?

Why Do We Care?

Approach

  • Obtain the empirical joint distribution of acceleration and lag acceleration for all possible lags (which can be represented as a series of images)

  • Image partitioning: compute summaries of the joint distribution, use summaries to predict identity

  • Functional regression: use joint distribution in trivariate functional regression to predict identity

Obtaining the joint distribution

Segment data

Obtaining the joint distribution

Segment data

Obtaining the joint distribution

Determine acceleration, lag acceleration for each segment and lag

Obtaining the joint distribution

Determine acceleration, lag acceleration for each segment and lag

Obtaining the joint distribution

Determine acceleration, lag acceleration for each segment and lag

Obtaining the joint distribution

Determine acceleration, lag acceleration for each segment and lag

Image partitioning

Partition acceleration by lag acceleration grid into 2D cells

Image partitioning

Count number of points in each cell

Image partitioning

Count number of points in each cell

Image partitioning

Select predictors from cells

  • We have 144 cells for each lag and 99 possible lags – too many predictors!
  • Exploratory analyses indicated three lags (0.15, 0.30, 0.45s) performed well
  • \(144*3\) potential predictors
  • Remove predictors with near zero variance or few unique values

Image partitioning

Fit models

  • We have transformed abstract problem into well-defined classification problem
  • Model: \[ Y_{ij}^{i_0} | X_{ij1}, \dots, X_{ijG} \] Where \(X_{ijg}\) is number of acceleration, lag acceleration pairs for subject \(i\) in grid cell \(g\) at second \(j\), \(Y_{ij}^{i_0} = 1\) if subject \(i = i_0\), and 0 otherwise
  • Use one vs. rest classification (separate model for each individual)
  • Machine learning using \(\texttt{tidymodels}\); logistic regression
  • For logistic regression models, use correlation and multiplicity adjusted (CMA) confidence intervals1 for coefficients to identify grid cells that are most predictive of identity

Functional regression

  • Instead of summarizing joint distribution, use functional regression of the form: \[{\rm logit}\{p_{ij}^{i_0}\}= \int_{s,u} F\{v_{ij}(s-u),v_{ij}(s),u\}dsdu\]

  • \(Y_{ij}^{i_0} \sim \text{Bernoulli}(p_{ij}^{i_0})\)

  • \(v_{ij}(s-u)\) is acceleration for subject \(i\), second \(j\), at \(s-u\) (i.e. lag acceleration)

  • \(v_{ij}(s)\) is acceleration for subject \(i\), second \(j\), at \(s\) (i.e. acceleration)

  • \(u = 1, \dots, S-1 = 99\) (all 99 lags), \(s= u +1, \dots, S = 100\)

  • \(F(\cdot, \cdot, \cdot)\) takes values at every point in domain of 3D images (acceleration, lag acceleration, and lag)

  • Implement model using \(\texttt{mgcv::gam}\) after manipulating empirical joint distribution into matrices of acceleration, lag acceleration, and lag

Application

Two datasets:

  • Indiana Unversity (IU)2: 32 subjects, 8 min walking per subject
  • Zhejiang University (ZJU)3: 153 subjects, two trials at least one week and up to six months apart, 1 min walking per subject
    • Use for two tasks: within session prediction (train on 75\(\%\) of seconds in session 1, predict on other 25\(\%\))
    • Out of session prediction: train on session 1, predict on session 2

Results

Data and Task Strategy Rank-1 Accuracy Rank-5 Accuracy Rank-1 Correct Rank-5 Correct
IU Image partitioning - logistic 1.00 1.00 32 32
IU Image partitioning - ML 0.97 1.00 31 32
IU Functional 1.00 1.00 32 32
ZJU S1 Image partitioning - logistic 0.93 0.99 140 151
ZJU S1 Image partitioning - ML 0.71 0.97 109 149
ZJU S1 Functional 0.98 1.00 150 153
ZJU S1S2 Image partitioning - logistic 0.41 0.75 63 114
ZJU S1S2 Image partitioning - ML 0.54 0.76 82 117
ZJU S1S2 Functional 0.53 0.69 81 106

Inference

Inference

Fingerprints

Well-predicted subject

Fingerprints

Poorly-predicted subject

Revisiting problem statement

Thank you!

Acknowledgements

  • Andrew Leroux, PhD, University of Colorado
  • Jaroslaw Harezlak, PhD, Indiana University
  • Yan Zhang, ScM, Johns Hopkins Bloomberg School of Public Health
  • Ciprian Crainiceanu, PhD, Johns Hopkins Bloomberg School of Public Health

Footnotes

  1. Ciprian M. Crainiceanu, Jeff Goldsmith, Andrew Leroux, and Erjia Cui. Functional Data Analysis with R. Springer New York, NY, USA, 2023

  2. Karas, M., Urbanek, J., Crainiceanu, C., Harezlak, J., & Fadel, W. (2021). Labeled raw accelerometry data captured during walking, stair climbing and driving (version 1.0.0). PhysioNet. https://doi.org/10.13026/51h0-a262.

  3. Yuting Zhang, Gang Pan, Kui Jia, Minlong Lu, Yueming Wang, and Zhaohui Wu. Accelerometer-Based Gait Recognition by Sparse Representation of Signature Points With Clusters. IEEE Transactions on Cybernetics, 45(9):1864–1875, September 2015