Should I trust my wearable's deep sleep reading?

Use it for trends (is deep sleep increasing or decreasing?), not for absolute numbers. The 8-12 min underestimation is consistent across nights.

Why are wearables less accurate for women?

Physiological changes during the menstrual cycle (especially temperature and heart rate variability) may affect the algorithms. More sex-specific training data is needed.

sleep-tech sleep-quality

Consumer Wearable Sleep Staging Shows Major Accuracy Gains — But Still Biased

May 6, 20261 min read

TL;DR

Consumer wearables are now accurate enough for trend tracking (>92% sleep-wake accuracy) but still systematically over-identify light N1 sleep by ~35%.

Background

Consumer sleep trackers have become ubiquitous, but their clinical accuracy has remained questionable. A new study from Stanford University School of Medicine published in npj Digital Medicine provides the most comprehensive head-to-head validation yet of the three most popular devices against gold-standard polysomnography (PSG).

Key Findings

190 healthy adults wore Apple Watch Series 9, Oura Ring Gen 3, and Fitbit Charge 6 simultaneously during two overnight PSG-monitored lab sessions (380 nights total).

Metric	Apple Watch S9	Oura Ring G3	Fitbit Charge 6
Sleep-Wake Accuracy	94.2%	92.8%	91.5%
N3 Deep Sleep	81.3%	78.6%	74.1%
REM Sleep	83.7%	85.2%	79.8%
N1 Over-identification	+32%	+35%	+39%

All devices slightly underestimated total deep sleep (by 8–12 min) and showed lower accuracy in female participants, especially during the luteal phase.

Clinical Implications

Trend Tracking Works: Excellent for measuring night-to-night variability
Not Diagnostic: Systematic N1 bias means devices aren't suitable for sleep disorder diagnosis
Sex-Inclusive Design Needed: Future algorithms should calibrate for menstrual cycle

References

[1]https://doi.org/10.1038/s41746-026-00315-7

Frequently Asked Questions

Apple Watch S9 led across most metrics in this study, but Oura Ring had slightly better REM detection. All three are within a few percentage points of each other.