Back to work

CASE STUDY · AI PRODUCTS
We made non-deterministic AI defects reproducible.
Key result:
0
3
8
~7 mo
- Industry
- AI products
- Engagement
- Embedded QA
- Coverage
- 3 AI products · 8 engineers
- Period
- ~7 months, ongoing
The Challenge
AppsUY ships AI-driven products whose failures don't repeat on command. A model gives a wrong answer once, then can't be reproduced — so it never gets fixed, it just resurfaces in front of the next user. The team needed a way to turn "it happened sometimes" into a defect an engineer can actually close.
What QARTY Delivered
- Studied each product's AI behavior and failure surface before testing
- Designed structured scenarios that pin down non-deterministic outputs
- Captured inputs, seeds, and context so every defect reproduces on demand
- Handed off a repeatable process the engineering team runs without us
Results
- Previously non-reproducible AI defects became reliably reproducible
- Engineers fix root causes instead of chasing ghosts
- Quality coverage across 3 AI products with no in-house QA hire
“Validating AI-first platforms demands a QA approach that goes far beyond conventional testing, and QARTY rose to the challenge. A professional, proactive team — an indispensable ally for any developer.”
Matías GonzálezFounder & CEO, AppsUYKey Insight
Non-determinism is not untestable — it's under-instrumented. Once you capture the full input and context around a failure, an "unreproducible" AI bug becomes an ordinary one.
Products covered
Automation Flow
non-deterministicAnalytics
dataOur approach
- Domain study
- Structured test scenarios
- Reproduce & isolate
- Process handoff