Research

Does synthetic data
actually work?

We measured it - paired against real-data-only baselines, 7 to 15 seeds, on a frozen test set, across 8 industries and three imaging types, from color photos to X-ray.

The result

Every hardest class moved.

On the single hardest, most starved class in each dataset, adding Synthgen synthetic data to the same real data lifted accuracy by +9 to +35 points.

Real data only + Synthgen

Pill

Pharmaceutical · faulty imprint

46% → 81%

+34.8

pp · z 5.6

+34.8

pp · z 5.6

Magnetic Tile

Electronics · blowhole

56% → 85%

+29.1

pp · z 13.2

+29.1

pp · z 13.2

Marble

Stone · crack

51% → 76%

+24.8

pp · z 7.5

+24.8

pp · z 7.5

Capsule

Pharmaceutical · crack

38% → 62%

+23.3

pp · z 7.1

+23.3

pp · z 7.1

Zipper

Textile · broken teeth

63% → 84%

+21.0

pp · z 3.4

+21.0

pp · z 3.4

RIAWELC

Welding NDT · porosity

47% → 66%

+19.1

pp · z 5.2

+19.1

pp · z 5.2

Metal Nut

Metal hardware · scratch

71% → 87%

+15.9

pp · z 3.8

+15.9

pp · z 3.8

PlantVillage

Agriculture · late blight

50% → 64%

+13.3

pp · z 4.2

+13.3

pp · z 4.2

Wood

Building materials · hole

91% → 99%

+9.3

pp · z 3.5

+9.3

pp · z 3.5

Per-class accuracy on the hardest class in each dataset. Paired vs real-only, 7-15 seeds, frozen test split, placebo-controlled. z = standard deviations above the paired baseline.

Data efficiency

The less real data you have, the more synthetic data pays.

With Synthgen, two labelled examples per class reach the same accuracy as five real ones alone - 60% fewer labels. The gap is widest exactly when your real data is scarcest.

Real data only + Synthgen

MVTec Pill, accuracy vs real labelled images per class, paired. With + Synthgen, 2 labels per class (44.6%) reach the accuracy of 5 real labels alone (45.4%) - and the lift is largest exactly when real data is scarcest.

“Every hardest-class gain held up paired over 7 to 15 random seeds, on a frozen test set we never touched during training.”

Synthgen internal benchmark

Where it fits

Strongest where you're weakest.

Synthetic data pays exactly where real data is scarce and the class is hard - the rare, costly classes you can't collect enough of.

Rare, hard classes

The appearance-based classes starved of real examples - exactly where models fail and where it costs you most.

Few-shot and cold start

The less real data you have, the more synthetic data is worth - a working model before you have collected hundreds of examples.

Weak-spot localization

Pinpointing where it is in the image, on exactly the starved cases the real-only model localizes worst.

Run it on your data

See the lift on your hardest class.

Bring your most-confused, least-labelled class. We will show you the gain on a pilot.

Book a pilot

Does synthetic dataactually work?