Back to work

CASE STUDY · AI PRODUCTS

AppsUYQARTY

We made non-deterministic AI defects reproducible.

Key result:

0

3

8

~7 mo

Industry
AI products
Engagement
Embedded QA
Coverage
3 AI products · 8 engineers
Period
~7 months, ongoing

The Challenge

AppsUY ships AI-driven products whose failures don't repeat on command. A model gives a wrong answer once, then can't be reproduced — so it never gets fixed, it just resurfaces in front of the next user. The team needed a way to turn "it happened sometimes" into a defect an engineer can actually close.

What QARTY Delivered

  • Studied each product's AI behavior and failure surface before testing
  • Designed structured scenarios that pin down non-deterministic outputs
  • Captured inputs, seeds, and context so every defect reproduces on demand
  • Handed off a repeatable process the engineering team runs without us

Results

  • Previously non-reproducible AI defects became reliably reproducible
  • Engineers fix root causes instead of chasing ghosts
  • Quality coverage across 3 AI products with no in-house QA hire
Validating AI-first platforms demands a QA approach that goes far beyond conventional testing, and QARTY rose to the challenge. A professional, proactive team — an indispensable ally for any developer.
Matías GonzálezFounder & CEO, AppsUY

Key Insight

Non-determinism is not untestable — it's under-instrumented. Once you capture the full input and context around a failure, an "unreproducible" AI bug becomes an ordinary one.

Products covered

AI Agent

AI agent

Automation Flow

non-deterministic

Analytics

data

Our approach

  1. Domain study
  2. Structured test scenarios
  3. Reproduce & isolate
  4. Process handoff

Want to know what reaches production before your users do?