The 349 Workflow Audit — What We Look For

20 February 2026Enjin Studio

An inside look at our AI Workflow Audit: the checklist, scoring, and three anonymized examples that show common failure modes.

The 349 Workflow Audit — What We Look For

We run an “349” audit because we needed a repeatable way to evaluate client workflows quickly. The number isn’t mystical — it’s shorthand for a three-step, four-layer, nine-question framework that surfaces where AI can help and where it will hurt.

The framework, at a glance

Step (3): intake, processing, output. Any workflow can be decomposed into these stages. Layer (4): data quality, decision logic, action surface, observability. Each layer reveals different risks and opportunities. Questions (9): nine focused probes, like “where are edge cases handled?” and “who owns recovery?” These drive the score.

What we measure

- Reliability: are results consistent across inputs? - Latency: is the workflow time-sensitive? - Error cost: how bad is a mistake? Financially? Reputationally? - Automation potential: fraction of tasks safe to automate without human oversight.

Three short examples

1) Contract redlines (legal tech)

Problem: teams used models to suggest redlines and then accepted them wholesale. Risk: missing semantic obligations. In the audit we found poor verification (no clause-level checks) and no rollback. Score: low for safety, medium for automation potential if narrow validators are introduced.

2) Customer support triage (SaaS)

Problem: a support queue where models classified intent and suggested replies. The team wanted higher automation. We found the model was fine for routing but terrible at promises that required back-end checks. Our recommendation: automate routing and canned replies for non-action requests, but gate account-sensitive replies behind a human-in-the-loop. Score: high automation for routing, low for autonomous replies.

3) Field-service scheduling (logistics)

Problem: multiple constraints (technician skills, vehicle availability, customer windows). The client had a heuristic scheduler that broke under volume. We found opportunities for constrained optimization and a clear feedback loop; however, data quality (missing skill tags) was the blocker. Recommendation: invest in clean canonical data, plug in a constrained scheduler, and run shadow mode for 30 days. Score: high potential, but medium readiness.

How we score

We convert qualitative findings into a readiness score out of 100. Anything above 70 is ready for measured automation; 40–70 needs remediation; under 40 we advise human-first workflows and data cleanup.

Common remediation checklist

- Add explicit validators for model outputs - Introduce shadow mode to compare automated vs human outcomes before go-live - Provide human-in-the-loop for high-cost decisions - Improve data hygiene: canonical fields, consistent timestamps, unique IDs - Instrument rollback and monitoring

Why this matters

Too many AI projects try to automate before understanding failure modes. The 349 audit is blunt: it forces teams to be honest about where automation will reduce cost and where it will increase risk. Our clients use the score to prioritize engineering work and to decide which workflows to run in automation-first vs human-first modes.

Final thought

Audits are not kill-switches. They’re roadmaps. If you want to do meaningful automation, start with a tight framework, measure honestly, and build the smallest set of controls that make you comfortable. The 349 audit is our blueprint for moving from interesting AI experiments to production-safe workflows.

← Back to Insights