CASE STUDY · AI PRODUCTS

We made non-deterministic AI defects reproducible.

AI defects left non-reproducible

AI products under QA

Engineers supported

~7 mo

Engagement, ongoing

Industry: AI products
Engagement: Embedded QA
Coverage: 3 AI products · 8 engineers
Period: ~7 months, ongoing

The Challenge

AppsUY ships AI-driven products whose failures don't repeat on command. A model gives a wrong answer once, then can't be reproduced — so it never gets fixed, it just resurfaces in front of the next user. The team needed a way to turn "it happened sometimes" into a defect an engineer can actually close.

What QARTY Delivered

Studied each product's AI behavior and failure surface before testing
Designed structured scenarios that pin down non-deterministic outputs
Captured inputs, seeds, and context so every defect reproduces on demand
Handed off a repeatable process the engineering team runs without us

Results

Previously non-reproducible AI defects became reliably reproducible
Engineers fix root causes instead of chasing ghosts
Quality coverage across 3 AI products with no in-house QA hire

“Validating AI-first platforms demands a QA approach that goes far beyond conventional testing, and QARTY rose to the challenge. A professional, proactive team — an indispensable ally for any developer.”

Matías GonzálezFounder & CEO, AppsUY

Key Insight

Non-determinism is not untestable — it's under-instrumented. Once you capture the full input and context around a failure, an "unreproducible" AI bug becomes an ordinary one.

Products covered

AI Agent

AI agent

Automation Flow

non-deterministic

Analytics

data

Our approach

Domain study
Structured test scenarios
Reproduce & isolate
Process handoff

Want to know what reaches production before your users do?

Let's talk 20 minutes