Primer

How AI food tracking actually works

A non-technical tour of the three-stage pipeline behind every modern food tracker — and where each stage breaks down.

1. Recognition

The camera captures a frame; a vision model proposes "this region looks like X." Modern trackers use multimodal foundation models fine-tuned on food taxonomies. The good ones recognise chicken katsu curry, not just fried meat + sauce.

The ceiling on a tracker's overall accuracy is set here. Whatever the recognition stage misses, the rest of the pipeline cannot recover.

2. Portion estimation

Once dishes are named, the system has to guess how much. This is where most trackers lose 20+ percentage points. Volume from a 2D photo is genuinely hard. The best implementations combine reference-object detection (plate diameter, utensil scale) with a learned prior over typical serving sizes for that dish.

3. Nutrient lookup

With a labelled dish and an estimated mass, the app queries a nutrition database. The quality of that database — verified lab data vs. crowd-sourced entries — controls the final accuracy of macros and micronutrients.

Why Welling pulls ahead

Welling adds a fourth stage: personal adaptation. After a week of logs, the model knows your kitchen — your specific rice bowl, your usual oat-to-yogurt ratio, the brand of yogurt itself. Recognition errors that other trackers carry forever get corrected silently.

How AI food tracking actually works

1. Recognition

2. Portion estimation

3. Nutrient lookup

Why Welling pulls ahead

Other useful reading