Study protocolStudy protocol · Pre-data

Real-Time Multimodal Sleep Staging from Consumer Wearable Sensors Validated Against Ear-EEG: A Study Protocol

Jonathan Berent

NextSense, Inc., Mountain View, CA, USA · Correspondence: jb@nextsense.io

NextSense study protocol (target: JMIR Research Protocols / npj Digital Medicine). This is a STUDY PROTOCOL — it specifies design, methods, and the planned analysis. No data have been collected and no results are reported. Execution is gated on IRB approval and device-ready data collection. Target: JMIR Research Protocols / npj Digital Medicine.

Abstract

Background. Polysomnography (PSG) is the clinical gold standard for sleep staging but is lab-bound and obtrusive, limiting longitudinal, real-world monitoring. Consumer wearables promise accessible sleep tracking, yet most validate offline and against PSG rather than in real time, and few resolve the full stage structure. Objective. We describe a protocol to evaluate the feasibility of predicting sleep stages in real time from heart-rate, accelerometer, and microphone data recorded by a consumer smartwatch, using simultaneously recorded ear-EEG as the reference standard, and to compare three sensor combinations. Methods. Prospective, single-center, observational study; up to 18 healthy adults (≥22 years, no known sleep disorders) enrolled to yield ≥16 evaluable, each recorded for one overnight session with a smartwatch (heart rate, tri-axial accelerometer, microphone) and a simultaneous ear-EEG device scored into four stages (wake, light, deep, REM). Planned analysis. The primary endpoint is agreement (Cohen's κ) between watch-predicted and ear-EEG-reference stages; a one-sided non-inferiority test evaluates whether κ is non-inferior to a substantial-agreement criterion of 0.61, at α = 0.05 and 80% power. Secondary analyses compare three sensor arms and report per-stage performance and derived sleep parameters. Status. Pre-data; execution pending IRB approval.

1. Introduction

Sleep is increasingly recognized as central to cognitive, metabolic, and emotional health, yet the gold-standard tool for measuring it—polysomnography—remains confined to the laboratory. PSG is comprehensive but obtrusive, expensive, and typically limited to one or two nights, making it poorly suited to the longitudinal, ecological monitoring that both research and consumers increasingly want. Wearable sensors embedded in consumer devices have transformed access to sleep information, but two gaps persist. First, most consumer systems estimate sleep offline, after the night, whereas many of the most valuable applications—smart-alarm timing, closed-loop audio, just-in-time interventions—require staging in real time. Second, validation has overwhelmingly used wrist actigraphy against PSG, and non-EEG modalities alone struggle to resolve the full stage ladder.

Ear-EEG offers a practical reference standard that is itself wearable: in- and around-ear electrodes recover sleep-stage structure with agreement against PSG approaching expert inter-scorer reliability. This makes possible a study that would be impractical with PSG at scale: validating real-time, smartwatch-based staging against a comfortable neural reference across full nights at home or in a sleep-friendly setting. This protocol specifies such a study.

2. Objectives

Primary objective. To evaluate the feasibility of predicting sleep stages in real time from heart-rate, accelerometer, and microphone data recorded with a consumer smartwatch, by comparing watch-derived stage predictions against simultaneously recorded ear-EEG reference stages.

Secondary objective. To compare real-time staging performance across three sensor combinations: (i) heart rate + accelerometer; (ii) microphone; and (iii) heart rate + accelerometer + microphone.

3. Methods

3.1 Study design. Prospective, single-center, observational study. Each participant contributes one overnight recording session.

3.2 Participants. Adults aged ≥22 years with no known sleep disorders will be enrolled. Up to 18 participants will be enrolled to ensure a minimum of 16 evaluable participants, allowing for ~10% attrition or data loss. Inclusion/exclusion criteria, recruitment, and informed-consent procedures will be specified in the IRB-approved protocol.

3.3 Apparatus and data acquisition. Smartwatch signals will be acquired via standard mobile APIs: tri-axial accelerometer at 50–100 Hz and heart rate at ~1 Hz (CoreMotion), and stereo audio from the built-in microphones (AVFAudio). Simultaneously, an ear-EEG device will record overnight to provide the reference. All streams are timestamped to a common clock to permit realignment across their differing sampling rates. Four sleep stages will be derived: wake, light sleep, deep sleep, and REM.

3.4 Real-time staging pipeline. A staging model will be trained offline on existing labeled data and then applied online to incoming smartwatch streams. Per-epoch features (band-limited and statistical features from accelerometer, heart rate, and audio) feed a classifier producing stage estimates at the standard 30-second epoch cadence. The same trained pipeline will be evaluated under each of the three sensor-combination arms to isolate the contribution of each modality. Real-time operation tolerates modest latency, so on-device or phone-side inference is feasible.

3.5 Reference standard. Ear-EEG recordings will be scored into the four stages above to serve as the per-epoch reference against which watch predictions are compared. Scoring procedure, scorer training, and any consensus rules will be pre-specified in the IRB-approved protocol; current AASM-aligned guidance will be followed where applicable.

3.6 Endpoints. Primary endpoint. Epoch-by-epoch agreement between watch-predicted stages and ear-EEG reference stages, quantified by Cohen's κ. A κ greater than 0.61 is interpreted as substantial agreement. Secondary endpoints. Per-arm κ (the three sensor combinations); per-stage sensitivity and specificity and confusion matrices; and agreement on derived sleep parameters (total sleep time, sleep-onset latency, efficiency, wake after sleep onset) assessed with Bland–Altman analysis.

4. Statistical analysis plan

The primary analysis is a one-sided non-inferiority test evaluating whether the Cohen's κ for watch-based staging is non-inferior to an acceptance criterion of 0.61, at a significance level of α = 0.05 and statistical power of 0.80. Secondary analyses compare κ across the three sensor arms and summarize per-stage performance; derived sleep parameters are compared with Bland–Altman limits of agreement. Missing or unscorable epochs will be handled by a pre-specified rule and reported transparently.

Sample-size justification. The sample size derives from a power analysis for a one-sided non-inferiority test, N = (Zα + Zβ)² · σ² / d², with Zα = 1.96 (α = 0.05) and Zβ = 0.84 (power = 0.80), criterion κ = 0.61. Required N across plausible standard deviations (σ) and effect sizes (d):

σEffect size d (κ)Required N
0.050.10 (0.71)2
0.050.05 (0.69)8
0.050.02 (0.63)49
0.070.10 (0.71)4
0.070.05 (0.69)16
0.070.02 (0.63)96

Assuming σ = 0.07 and d = 0.05 with a 10% dropout allowance, 18 participants will be enrolled to achieve a minimum of 16 evaluable subjects. The final number may be refined from pilot data.

5. Data management

All signals are timestamped to a common clock and stored with provenance. Smartwatch and ear-EEG streams, derived features, model predictions, and reference scores will be retained to permit re-analysis. Data handling, de-identification, retention, and security will follow the IRB-approved data-management plan; given the sensitivity of physiological data, on-device processing and data minimization are preferred where feasible.

6. Ethics, status, and timeline

This study requires Institutional Review Board (IRB) approval prior to any data collection; informed consent will be obtained from all participants. The indicative timeline once IRB is secured: device/app readiness and pilot (1–2 participants), enrollment of up to 18 participants (one overnight session each), scoring and analysis, and manuscript. This protocol may be deposited as a preprint and/or pre-registered prior to data collection.

Frequently asked questions

What does this study protocol propose to test?

Whether sleep stages can be predicted in real time from a consumer smartwatch’s heart-rate, accelerometer, and microphone signals, validated against simultaneously recorded ear-EEG as the reference standard, and how three sensor combinations compare. No data have been collected yet; execution is pending IRB approval.

Why use ear-EEG instead of polysomnography as the reference?

Polysomnography is the clinical gold standard but is lab-bound, obtrusive, and hard to run at scale. Ear-EEG recovers sleep-stage structure with agreement against PSG approaching expert inter-scorer reliability, while being wearable — making it a practical reference for validating real-time, at-home smartwatch staging across full nights.

What is the primary endpoint and success criterion?

Epoch-by-epoch agreement between watch-predicted and ear-EEG-reference stages, measured by Cohen’s kappa. A one-sided non-inferiority test evaluates whether kappa is non-inferior to a substantial-agreement criterion of 0.61, at alpha = 0.05 and 80% power, with up to 18 participants enrolled to yield at least 16 evaluable.

Acknowledgements

The conceptual design is adapted from an internal NextSense / Stephanie Martin Study Design Synopsis, which we acknowledge, with AE Studio.

How to cite

Berent J. “Real-time multimodal sleep staging from consumer wearable sensors validated against ear-EEG: a study protocol.” NextSense study protocol; 2026.

References

  1. Imtiaz SA (2021). A Systematic Review of Sensing Technologies for Wearable Sleep Staging. Sensors 21(5):1562.
  2. Landis JR, Koch GG (1977). The Measurement of Observer Agreement for Categorical Data. Biometrics 33(1):159–74.
  3. Mikkelsen KB, Kappel SL, Mandic DP, Kidmose P (2015). EEG Recorded from the Ear: Characterizing the Ear-EEG Method. Frontiers in Neuroscience 9:438.
  4. Nakamura T, Alqurashi YD, Morrell MJ, Mandic DP (2020). Hearables: Automatic Overnight Sleep Monitoring With Standardized In-Ear EEG Sensor. IEEE Transactions on Biomedical Engineering 67(1):203–12.
  5. Tabar YR, Mikkelsen KB, Rank ML, et al. (2021). Ear-EEG for Sleep Assessment: A Comparison with Actigraphy and PSG. Sleep and Breathing 25(3):1693–1705.
  6. Walker E, Nowacki AS (2011). Understanding Equivalence and Noninferiority Testing. Journal of General Internal Medicine 26(2):192–96.

More NextSense research