Generating Parquet Files with GGIR

GGIR can export its key output data to a single consolidated Apache Parquet file. This file is designed to be uploaded directly to the GGIR Dashboard, where you can interactively explore and visualise your study results without any additional coding or data processing.

Prerequisites

The arrow R package must be installed. GGIR requires R ≥ 4.0.

install.packages("arrow")

For full documentation on GGIR installation and configuration, see the official GGIR documentation and the CRAN package page.

How to Enable Parquet Export

Set the parameter save_dashboard_parquet to TRUE in your GGIR() call. The Parquet file is generated automatically after all requested parts and reports have completed.

Important: All five parts (1–5) and the reports for parts 2, 4, and 5 must have run successfully before the export can produce meaningful output, because the Parquet file consolidates data from across these reports.

save_dashboard_parquet defaults to FALSE. No Parquet file is generated unless you explicitly set it to TRUE.

Code Examples

Minimal example

library(GGIR)
GGIR(
  mode = 1:5,
  datadir = "C:/mystudy/mydata",
  outputdir = "C:/myresults",
  do.report = c(2, 4, 5),
  save_dashboard_parquet = TRUE
)

Combined with other GGIR parameters

library(GGIR)
GGIR(
  mode = 1:5,
  datadir = "C:/mystudy/mydata",
  outputdir = "C:/myresults",
  studyname = "my_study",
  do.report = c(2, 4, 5),
  threshold.lig = 40,
  threshold.mod = 100,
  threshold.vig = 400,
  save_dashboard_parquet = TRUE
)

Output File

When enabled, the following file is created in the results directory:

<outputdir>/output_<studyname>/results/participant_id.parquet

Replace <outputdir> and <studyname> with the values you passed to GGIR(). If you did not set a studyname, GGIR uses the input folder name.

Using with the Dashboard

Once GGIR has finished processing and the Parquet file has been generated:

Navigate to the GGIR Dashboard.
Upload your participant_id.parquet file.
The dashboard will automatically parse the file and present interactive visualisations of your study data.

No additional software, coding, or data manipulation is required.

Available Views

After uploading, you can explore your data across three pages:

Upload — Preview the raw data table, inspect the column schema, and review embedded file metadata.
Dashboard — Study-level charts organized into Data Quality, Sleep Analysis, and Physical Activity tabs. Tabs are automatically enabled based on the columns present in your file.
Epoch Explorer — Participant-level, day-by-day epoch time series. Requires the nested epochs column (included by default when Part 5 time series output is saved).

What the Parquet File Contains

The Parquet file is a single consolidated table at the day level (one row per participant per calendar day). It is built by automatically joining data from multiple GGIR output parts:

Source	Description
Part 5 day summary	Day-level physical activity and time-use data. This forms the base of the file.
Part 4 night summary	Per-night sleep variables (sleep onset, wake time, WASO, sleep duration, etc.).
Part 2 day summary	Daily activity summaries (L5/M5, MVPA).
Part 2 person summary	Recording-level metadata (device serial number, calibration error, measurement duration, etc.).
Data quality report	Calibration and file quality indicators.
Epoch-level time series	If Part 5 time series output was saved (the default), epoch-level data is nested within each day for fine-grained visualisation in the dashboard.

The file also carries embedded metadata such as the acceleration thresholds used, the acceleration metric (e.g., ENMO), epoch length, and a variable dictionary — all of which the dashboard uses to correctly label and interpret your data.

Database Schema & Developer Notes

For users who want to query the Parquet file outside of this dashboard (e.g., via Python, R, or native DuckDB), the file uses a flat, day-level schema where each row represents one participant per calendar day. The primary key is (id, calendar_date).

Nested `epochs` Column

Each row's epochs field is a LIST of STRUCTS. Each struct represents one epoch (typically 5 seconds). The fields are:

Field	Type	Description
timenum	INT64 / DOUBLE	Unix timestamp (seconds since 1970-01-01)
acc	DOUBLE	Acceleration metric for this epoch (typically ENMO in milli-gravity)
class_id	INT32	Behavioral class code (see legend below)
spt	BOOLEAN	TRUE if this epoch falls within the sleep period time
invalid	BOOLEAN	TRUE if classified as invalid (non-wear, clipping, etc.)
window	INT32	Window/day number this epoch belongs to
anglez	DOUBLE	(optional) Arm angle relative to horizontal, in degrees
lux	DOUBLE	(optional) Light intensity in lux
temperature	DOUBLE	(optional) Skin/device temperature in Celsius
steps	INT32	(optional) Step count for this epoch

Behavioral Classes

The class_id maps to behavioral classes. The exact mapping is stored as JSON in the behavioral_codes Parquet metadata key. A standard mapping is:

0 = Inactive during waking (IN)
1 = Light physical activity (LIG)
2 = Moderate physical activity (MOD)
3 = Vigorous physical activity (VIG)
4 = Sleep during SPT
5–8 = Awake during SPT at various intensities

Query Examples (DuckDB)

Day-level summary with non-wear filter:

SELECT id, calendar_date, weekday,
       dur_day_total_mod_min + dur_day_total_vig_min AS mvpa_min,
       dur_spt_sleep_min, sleep_efficiency_after_onset
FROM ggir
WHERE nonwear_perc_day_spt < 25;

Flatten epochs for a specific day:

SELECT g.id, g.calendar_date, e.*
FROM ggir g, UNNEST(g.epochs) AS e
WHERE g.id = 'participant_001'
  AND g.calendar_date = '2024-03-15';

Read file metadata:

SELECT * FROM parquet_kv_metadata('participant_id.parquet');

Privacy & Data Security

This dashboard is designed with researcher data privacy as a core principle. It uses DuckDB-WASM to query and process your data entirely within your browser.

No data leaves your machine. Your Parquet file is processed locally in your browser's memory and is never transmitted to any server.

File reading — your file is read directly into browser memory via the File API; no upload occurs.
SQL queries — all DuckDB queries execute inside a WebAssembly sandbox in your browser tab.
Visualizations — every chart is rendered client-side; no server-side rendering or data relay.
No analytics on your data — the app does not send your participant data to any analytics or tracking service.
Works offline — after the initial page load, the dashboard functions fully without an internet connection. You can verify this by disconnecting after the page loads.

Troubleshooting

If the Parquet file is not created after your GGIR run completes, check the R console for the following warning messages:

Warning	Cause
"No results directory found"	GGIR did not produce any output. Verify that outputdir is correct and that the pipeline completed without errors.
"No Part 5 day summary CSVs found"	Part 5 and its report have not been run. Ensure mode includes 5 and do.report includes 5.
"Part 5 day summary CSVs are empty"	Part 5 ran but produced no valid data rows. Check your input data and cleaning/inclusion thresholds.
"Consolidated data is empty"	The join across parts produced zero rows. This may indicate a mismatch in participant IDs or calendar dates across parts.