Here my goal is to begin exploring some CGM (continuous glucose monitoring) data to get a better understanding for how to work with these types of data and what their potential are. This was inspired by Irina Gaynanova’s website (https://irinagain.github.io/CGM/) where her lab group worked on compiling CGM datasets and calculating various statistics from these data. In fact, they also created an R package and associated shiny app for exploring CGM data, which I may use in this exploration here.
The data come from this repository: (https://github.com/irinagain/Awesome-CGM) where Itina Gaynanova and her colleagues compiled free and available CGM datasets.
The specific datasets I will use below includes Allepo et al. (2017) (https://diabetesjournals.org/care/article/40/4/538/3687/REPLACE-BG-A-Randomized-Trial-Comparing-Continuous) and Hall et al. (2018) (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2005143#pbio.2005143.s010)
Required disclaimer: The source of the data is from Allepo et al. (2017) and Hall et al. (2018), but the analyses, content and conclusions presented herein are solely the responsibility of the authors and have not been reviewed or approved by Allepo et al. (2017) or Hall et al. (2018).
Here are just some ideas of ways in which I could approach these data:
Basic visualizations of CGM readings by subject over time
Daily summaries of average fluctuations including variation and/or confidence ribbons
The R Package
iglu (stands for ierpreting glucose?) can allow the calculation of numerous metrics for blood glucose profiles which may be more or less useful for helping us analyze and quantify these profiles in various contexts.
For example, maybe these metrics can be used as features in some type of predictive model for diabetes.
Those data might also be useful for predicting future glucose levels when implementing automatic insulin supply (e.g. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253125)
To not reinvent the wheel, here is a good reference from the study above about the models they used for predicting glucose levels into the near future (15 and 60 minute mark) (https://doi.org/10.1371/journal.pone.0253125.s015). These included ARIMA, Support Vector Regression, Gradient-boosting trees, Feed-forward neural networks, and recurrent neural networks.
There is also this thing called a Surveillence Error Grid which assigns different levels of risk to predictions of blood glucose levels. For example, predicting a glucose level of 120 but the actual value being 500 is very risky compared to predicting 160 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4764212/)
Load the necessary packages and functions here:
library(tidyverse) # for magic library(RSQLite) # for loading SQLite data library(iglu) # for CGM metrics library(factoextra) # clustering algorithms & visualization library(ggforce) # add ellipses to pca plots library(concaveman) # for adding hulls to pca plots library(vegan) # for NMDS analysis library(caret) # for cross-validation library(ropls) # for PCA and PLS regression (to install: https://rdrr.io/bioc/ropls/) library(chemhelper) # for use with ropls (to install: https://rdrr.io/github/Aariq/chemhelper/f/README.md) library(ggrepel) # add labels to a plot that don't overlap library(glue) # for formatting strings of text in figures library(cowplot) # for plotting multiple plots together
# Read the raw data in = read_tsv("raw_data/hall-data/hall-data-main.txt")raw_hall_data
## Warning: One or more parsing issues, see `problems()` for details
# I get a warning because "low" was used for a few rows of readings, # maybe because they were too low to for the meter. # what could these 'low' values actually be? sort(raw_hall_data$GlucoseValue)[1:20]
##  40 40 40 41 41 41 41 41 42 42 42 42 42 42 43 43 43 43 43 43