Generating Parquet Files with GGIR
GGIR can export its key output data to a single consolidated Apache Parquet file. This file is designed to be uploaded directly to the GGIR Dashboard, where you can interactively explore and visualise your study results without any additional coding or data processing.
Prerequisites
The arrow R package must be installed. GGIR requires R ≥ 4.0.
install.packages("arrow")For full documentation on GGIR installation and configuration, see the official GGIR documentation and the CRAN package page.
How to Enable Parquet Export
Set the parameter save_dashboard_parquet to TRUE in your GGIR() call. The Parquet file is generated automatically after all requested parts and reports have completed.
Important: All five parts (1–5) and the reports for parts 2, 4, and 5 must have run successfully before the export can produce meaningful output, because the Parquet file consolidates data from across these reports.
save_dashboard_parquet defaults to FALSE. No Parquet file is generated unless you explicitly set it to TRUE.
Code Examples
Minimal example
library(GGIR)
GGIR(
mode = 1:5,
datadir = "C:/mystudy/mydata",
outputdir = "C:/myresults",
do.report = c(2, 4, 5),
save_dashboard_parquet = TRUE
)Combined with other GGIR parameters
library(GGIR)
GGIR(
mode = 1:5,
datadir = "C:/mystudy/mydata",
outputdir = "C:/myresults",
studyname = "my_study",
do.report = c(2, 4, 5),
threshold.lig = 40,
threshold.mod = 100,
threshold.vig = 400,
save_dashboard_parquet = TRUE
)Output File
When enabled, the following file is created in the results directory:
<outputdir>/output_<studyname>/results/participant_id.parquetReplace <outputdir> and <studyname> with the values you passed to GGIR(). If you did not set a studyname, GGIR uses the input folder name.
Using with the Dashboard
Once GGIR has finished processing and the Parquet file has been generated:
- Navigate to the GGIR Dashboard.
- Upload your
participant_id.parquetfile. - The dashboard will automatically parse the file and present interactive visualisations of your study data.
No additional software, coding, or data manipulation is required.
Available Views
After uploading, you can explore your data across three pages:
- Upload — Preview the raw data table, inspect the column schema, and review embedded file metadata.
- Dashboard — Study-level charts organized into Data Quality, Sleep Analysis, and Physical Activity tabs. Tabs are automatically enabled based on the columns present in your file.
- Epoch Explorer — Participant-level, day-by-day epoch time series. Requires the nested
epochscolumn (included by default when Part 5 time series output is saved).
What the Parquet File Contains
The Parquet file is a single consolidated table at the day level (one row per participant per calendar day). It is built by automatically joining data from multiple GGIR output parts:
| Source | Description |
|---|---|
| Part 5 day summary | Day-level physical activity and time-use data. This forms the base of the file. |
| Part 4 night summary | Per-night sleep variables (sleep onset, wake time, WASO, sleep duration, etc.). |
| Part 2 day summary | Daily activity summaries (L5/M5, MVPA). |
| Part 2 person summary | Recording-level metadata (device serial number, calibration error, measurement duration, etc.). |
| Data quality report | Calibration and file quality indicators. |
| Epoch-level time series | If Part 5 time series output was saved (the default), epoch-level data is nested within each day for fine-grained visualisation in the dashboard. |
The file also carries embedded metadata such as the acceleration thresholds used, the acceleration metric (e.g., ENMO), epoch length, and a variable dictionary — all of which the dashboard uses to correctly label and interpret your data.
Database Schema & Developer Notes
For users who want to query the Parquet file outside of this dashboard (e.g., via Python, R, or native DuckDB), the file uses a flat, day-level schema where each row represents one participant per calendar day. The primary key is (id, calendar_date).
Nested epochs Column
Each row's epochs field is a LIST of STRUCTS. Each struct represents one epoch (typically 5 seconds). The fields are:
| Field | Type | Description |
|---|---|---|
| timenum | INT64 / DOUBLE | Unix timestamp (seconds since 1970-01-01) |
| acc | DOUBLE | Acceleration metric for this epoch (typically ENMO in milli-gravity) |
| class_id | INT32 | Behavioral class code (see legend below) |
| spt | BOOLEAN | TRUE if this epoch falls within the sleep period time |
| invalid | BOOLEAN | TRUE if classified as invalid (non-wear, clipping, etc.) |
| window | INT32 | Window/day number this epoch belongs to |
| anglez | DOUBLE | (optional) Arm angle relative to horizontal, in degrees |
| lux | DOUBLE | (optional) Light intensity in lux |
| temperature | DOUBLE | (optional) Skin/device temperature in Celsius |
| steps | INT32 | (optional) Step count for this epoch |
Behavioral Classes
The class_id maps to behavioral classes. The exact mapping is stored as JSON in the behavioral_codes Parquet metadata key. A standard mapping is:
0= Inactive during waking (IN)1= Light physical activity (LIG)2= Moderate physical activity (MOD)3= Vigorous physical activity (VIG)4= Sleep during SPT5–8= Awake during SPT at various intensities
Query Examples (DuckDB)
Day-level summary with non-wear filter:
SELECT id, calendar_date, weekday,
dur_day_total_mod_min + dur_day_total_vig_min AS mvpa_min,
dur_spt_sleep_min, sleep_efficiency_after_onset
FROM ggir
WHERE nonwear_perc_day_spt < 25;Flatten epochs for a specific day:
SELECT g.id, g.calendar_date, e.*
FROM ggir g, UNNEST(g.epochs) AS e
WHERE g.id = 'participant_001'
AND g.calendar_date = '2024-03-15';Read file metadata:
SELECT * FROM parquet_kv_metadata('participant_id.parquet');Privacy & Data Security
This dashboard is designed with researcher data privacy as a core principle. It uses DuckDB-WASM to query and process your data entirely within your browser.
No data leaves your machine. Your Parquet file is processed locally in your browser's memory and is never transmitted to any server.
- File reading — your file is read directly into browser memory via the File API; no upload occurs.
- SQL queries — all DuckDB queries execute inside a WebAssembly sandbox in your browser tab.
- Visualizations — every chart is rendered client-side; no server-side rendering or data relay.
- No analytics on your data — the app does not send your participant data to any analytics or tracking service.
- Works offline — after the initial page load, the dashboard functions fully without an internet connection. You can verify this by disconnecting after the page loads.
Troubleshooting
If the Parquet file is not created after your GGIR run completes, check the R console for the following warning messages:
| Warning | Cause |
|---|---|
| "No results directory found" | GGIR did not produce any output. Verify that outputdir is correct and that the pipeline completed without errors. |
| "No Part 5 day summary CSVs found" | Part 5 and its report have not been run. Ensure mode includes 5 and do.report includes 5. |
| "Part 5 day summary CSVs are empty" | Part 5 ran but produced no valid data rows. Check your input data and cleaning/inclusion thresholds. |
| "Consolidated data is empty" | The join across parts produced zero rows. This may indicate a mismatch in participant IDs or calendar dates across parts. |