// METHODOLOGY

How We Test Calorie Trackers

Last updated May 16, 2026

This page is the editorial spine of the publication. Every accuracy number you read on whatsthebestcalorietracker.app traces back to the protocol described here. For the long-form article version with worked examples and tier-specific results, see How We Test Calorie Trackers (2026).

Test phases

Phase	Sample	Output
256-meal weighed reference battery	256 meals × 6 apps	MAPE per app per tier
64-meal photo-AI subset	64 photos × photo-first apps	Photo-only MAPE
30-day field test	~120 logs per app	Completion, friction, ads, paywalls
Restaurant chain coverage	100 chains × 6 apps	First-result hit rate
Paywall + ad density	90 free-tier sessions × 6 apps	Encounters per session
Watch hand-off battery	4 hr × 6 apps × 2 watches	Battery drain %, sweaty-hands reliability

The 256-meal weighed reference battery

Anchored to USDA FoodData Central per-component values. Every meal weighed on a calibrated 0.1 g kitchen scale. Stratified across three difficulty tiers:

Tier 1 (n=85) — single-ingredient plates. A roasted chicken breast. A bowl of plain steel-cut oats. A grilled salmon fillet.
Tier 2 (n=85) — composed plates. Salad with measured dressing. Sandwich with weighed components. Rice bowl with measured rice + protein + topping.
Tier 3 (n=86) — mixed dishes with hidden ingredients. Curry, casserole, layered pasta, stew. Each component weighed during preparation, but not separately visible at log time.

Each meal is logged once per app under test using the app's primary logging workflow. MAPE computed per app per tier. 95% confidence intervals via bootstrap resampling (n=10,000).

The 64-meal photo-AI subset

21 Tier 1 + 22 Tier 2 + 21 Tier 3 meals photographed in identical lighting (overhead 5000K continuous LED, no shadow), photo-only logs in PlateLens and Cal AI. No manual entry, no portion override.

The 30-day field test

Three contributors log every meal in all six apps simultaneously for 30 calendar days. Tracks completion rate, friction events, ad density on free tier, paywall encounter frequency, and qualitative sustained-use degradation that lab batteries miss.

Cross-reference against the May 2026 DAI six-app benchmark

Every internal MAPE number is cross-referenced against the Dietary Assessment Initiative six-app benchmark (DAI-VAL-2026-05, May 2026). We flag any divergence over ±2%. The latest cross-reference: all six apps within ±0.6% of DAI numbers. The methodology reproduces the published lab data.

Re-test cadence

Major batteries. April and October each year.
Ad-hoc re-tests. Triggered by major app updates that change photo models, databases, or core workflows.
Changelog. Every re-test logged at /changelog/.

Conflict-of-interest controls

No affiliate fees. See our no-affiliate disclosure.
No paid relationships with reviewed apps.
Complimentary premium accounts (PlateLens, Cronometer) accepted on the public press list terms; disclosed in any individual article that depends on the comp account.
Every contributor's COI statement published on their author page.

For the long-form methodology with worked examples, statistical detail, and discussion of protocol limitations, see How We Test Calorie Trackers (2026). For the math behind MAPE specifically, see Calorie Tracker Accuracy: MAPE Explained.