Edge Case Evaluator.

How I built a system that forces AI
to map every possible failure before
writing a single line of code.

Type
Technical Article
Topic
AI Risk Analysis
Date
February 2026
Commands
16
Abstract

AI Solves Before It Thinks

AI coding assistants are remarkably good at generating solutions. Give them a problem, and they produce working code in seconds. But there is a fundamental flaw in this workflow:

AI jumps to solutions before understanding the full landscape of what could go wrong.

This is not a capability problem. It is a sequencing problem. The model has the knowledge to identify edge cases, trace consequences, and weigh trade-offs. It just does not do it unless you force it to.

I built ECE (Edge Case Evaluator) to solve this. It is a structured command system that makes AI map the “butterfly effect” of every decision before proposing any solution. Think of it as a pre-flight checklist for problem-solving.

The Problem

What AI Does Not Do

When you ask an AI assistant to solve a design or engineering problem, it typically does three things:

1Picks the first reasonable approach
2Generates a solution for it
3Moves on

What it does not do:

Compare Approaches
Consider 3 to 5 fundamentally different solutions to the same problem.
Trace Failures
Identify what could go wrong with each approach and follow those failures to worst-case outcomes.
Audit Itself
Check whether its own analysis has been influenced by anchoring, confirmation, or user-preference bias.

The result: Solutions that work in the happy path but break in production. Edge cases discovered during QA instead of during design. Approaches chosen because they “felt right,” not because they were systematically evaluated.

Solution

How ECE Works

ECE is a set of 17 structured commands that install into any AI coding environment. Each command forces the AI into a specific mode of thinking, with strict rules about what it can and cannot do.

The core workflow follows a deliberate sequence:

1Map
2Expand
3Validate
4Audit
5Optimize
6Synthesize
7Review
8Close

Each step has guardrails. The AI cannot skip ahead. It cannot propose solutions during the mapping phase. It cannot optimize without first checking its own bias.

Architecture

Spine + Leaf: Token-Efficient Evaluation

ECE uses a Spine + Leaf architecture designed for token efficiency:

The Spine
A compact map file (spine.md) containing:
+ Mermaid diagram of the full decision tree
+ Risk table listing every identified edge case
+ Risk ratings, probability estimates, confidence levels
+ The recommended route (once optimization is done)
The Leaves
Individual edge case files (nodes/<id>.md) containing:
+ Full butterfly effect chain (cause to terminal outcome)
+ Mitigation strategies
+ Cross-branch dependency links
+ Validation results

The key constraint: only one leaf file is loaded at a time. This prevents token bloat and forces focused analysis. The spine is always loaded for context, but detailed work happens one node at a time.

Commands

The Full Command Set

Core Loop (8 commands)

CommandPurpose
/ece mapCreates the initial butterfly-effect tree with 3 to 5 approaches and their immediate edge cases
/ece recommendAnalyzes all open nodes and suggests exactly which one to investigate next, with reasoning
/ece expandDeep-dives into a specific node, tracing its downstream effects to a terminal outcome
/ece validateStress-tests a proposed mitigation to check if the fix itself introduces new risks
/ece traceFollows a complete chain from any node to all its terminal outcomes
/ece blindspotAdversarial self-audit for anchoring bias, confirmation bias, user influence contamination
/ece optimizeScores all approaches using weighted risk-probability and recommends the safest route
/ece synthesizeGenerates a concrete solution blueprint with requirements and acceptance criteria

Solution Verification (2 commands)

CommandPurpose
/ece reviewEvaluates a UI screenshot against the solution blueprint, checking acceptance criteria and accessibility
/ece closeFinalizes the evaluation with a health score, archives results, and generates a history summary

Prototyping (1 command)

CommandPurpose
/ece wireframeGenerates an interactive React/Next.js wireframe from the solution blueprint. Requires prior /ece synthesize. Produces a two-panel layout: a control panel with toggle conditions, priority badges, and acceptance criteria on the left, and the primary interaction zone with state visualization on the right. All conditions derive from solution.md, nothing is invented.

Session Management (2 commands)

CommandPurpose
/ece resumeDetects where a previous evaluation was paused and tells you exactly what to do next
/ece statusFull node listing with risk, probability, confidence, and a smart next-action suggestion

Utilities (4 commands)

CommandPurpose
/ece test-genAuto-generates Given/When/Then test cases from butterfly effect chains
/ece impactBuilds a cross-reference matrix for compound failures between edge cases
/ece patternsIdentifies recurring risk categories across completed evaluations
/ece resetClears all evaluation data with confirmation
Scoring

Risk Quantification

Every node in the tree gets rated on two dimensions. The combination produces a weighted score that makes risk comparison objective rather than intuitive.

Risk LevelPoints
Critical (Red)10
Medium (Yellow)3
Low (Green)1
ProbabilityMultiplier
Almost Certain1.0
Likely0.7
Possible0.4
Unlikely0.2
Rare0.05

Weighted Score = Risk Level × Probability. A critical edge case that is almost certain to occur (score: 10.0) is treated very differently from a critical edge case that is rare (score: 0.5). During optimization, approaches are scored by summing all weighted scores in their branch.

Key Feature

The Blindspot System

This is the feature that makes ECE different from any other planning tool. AI models are susceptible to specific biases during extended analysis:

Anchoring Bias
The first approach identified becomes the unconscious favorite.
Confirmation Bias
Once a risk is rated, the AI tends to confirm rather than challenge it.
Sunk Cost Bias
Nodes that have been heavily expanded feel more important even when they are not.
External Contamination
If the user expresses a preference mid-evaluation, the AI may unconsciously weight it.
Recency Bias
The most recently expanded node feels most urgent.
Missing Perspectives
The AI defaults to its most familiar lens, missing security, accessibility, or business angles.

The /ece blindspot command runs an adversarial self-audit with five questions on every node:

1Risk Accuracy: Would I rate this the same if I had never seen the other nodes?
2Probability Validity: Is this probability based on evidence or assumption?
3Anchoring Check: Was this rating influenced by its parent or sibling?
4Contamination Check: Did user preferences influence this rating?
5Missing Perspectives: Is there a risk angle I have not considered?

With the --lens flag, it adds a sixth question through a specific expert perspective (security, accessibility, performance, business, or resilience).

ECE requires at least one blindspot check before allowing optimization. Unchecked bias produces unreliable recommendations.
Validation

Real-World Test: Blind User Navigation

To validate ECE, I ran it on a real accessibility problem:

Problem Statement

A navigation app for blind users has two large buttons (Stop and Repeat). The user needs a third “Call for Assistance” button, but the existing buttons are already maximized for accessibility. How do you integrate a third interaction without compromising the experience?

What ECE Mapped

Four fundamentally different approaches:

A
Triple Vertical Split
Add a third button by shrinking all three
B
Long-Press Gesture
Keep two buttons, add a hidden gesture
C
Shake to Call
Use device motion as input
D
Voice Command
Use speech recognition

What the Blindspot Check Caught

Rating Corrections
! “Muscle Memory Conflict” upgraded from Medium to Critical. Changing a Stop button position in a safety-critical nav app is a safety hazard, not a UX issue.
! “Voice Command” reliability corrected from Low risk to Medium. Blind users navigate in noisy real-world environments.
Missing Perspective
? The entire evaluation was missing a “Network Latency” perspective. What happens if the assistance call fails to connect?
Without the blindspot check, the optimization would have recommended an approach with undetected critical risks.
Process

Building and Testing the 55 Edge Cases

The blind user navigation evaluation did not stop at four approaches with a handful of risks. Using the full ECE workflow, I expanded every critical and medium-risk node, traced each one to its terminal outcome, and validated mitigations. The initial /ece map produced 4 approaches with 12 immediate edge cases. Successive rounds of /ece expand and /ece trace uncovered downstream failures that were invisible at the surface level. By the time the tree was fully mapped, the evaluation contained 55 distinct edge cases across the four approach branches.

How the Count Grew

PhaseCommand UsedEdge Cases FoundWhat Surfaced
Initial Map/ece map12First-level risks per approach: gesture conflicts, haptic ambiguity, voice reliability, touch target sizing
Critical Expansion/ece expand+18Second-level cascades: what happens when the first failure triggers a second (e.g., haptic misfire during obstacle proximity causes wrong turn)
Terminal Tracing/ece trace+11Worst-case chains: a missed obstacle alert leading to a wall collision leading to panic leading to emergency call failure
Validation Risks/ece validate+8Mitigation side-effects: adding a confirmation dialog to the call button introduces a delay that is dangerous during an emergency
Blindspot Audit/ece blindspot+6Missing perspectives: network latency during calls, battery drain from continuous haptic feedback, multi-floor elevator transitions

Testing Against the Design

Each of the 55 edge cases was tested against the final UI design using a three-step process:

1Test Case Generation: Used /ece test-gen to produce Given/When/Then acceptance tests from each butterfly chain. 55 edge cases produced 78 test scenarios (some chains had multiple terminal paths).
2Screenshot Review: Used /ece review to evaluate each screen against the synthesized blueprint. The review checked whether acceptance criteria were met and flagged accessibility gaps.
3Impact Cross-Reference: Used /ece impact to build a compound failure matrix. This identified 7 cases where two independent edge cases, if triggered simultaneously, would produce a third failure not captured in either branch.

From Edge Cases to Wireframe

After optimization selected the long-press gesture approach as the safest route, /ece synthesize generated the solution blueprint with requirements and constraints derived directly from the 55 edge cases. From there, /ece wireframe produced an interactive React prototype with a two-panel layout: a left panel showing all toggle conditions, priority badges, and the acceptance criteria checklist, and a right panel with the primary interaction zone where toggling conditions visually demonstrated how the UI responded to each edge case. The wireframe became the bridge between the abstract risk analysis and the concrete design decisions visible in the final screens.

55 edge cases sounds like a large number. But consider that standard AI planning for the same problem identified 3. The remaining 52 were discovered only because the evaluation forced systematic expansion, tracing, and bias auditing at every layer.
Self-Evaluation

Does ECE Actually Work?

I used ECE to evaluate ECE. The full evaluation is at .ece/evaluations/ece-vs-standard-planning/. I mapped 4 approaches with 16 edge cases, expanded 5 critical nodes, ran a blindspot audit on my own analysis, and optimized. Here is what the evaluation found.

Where ECE Outperforms Standard AI

DimensionStandard AIECE
Edge case coverageIdentifies 1 to 3 inlineMaps 12+ across 4+ approaches
Approach comparisonRecommends one (usually the most common)Enumerates and scores multiple
Bias detectionNoneAdversarial self-audit with targeted corrections
Risk quantificationQualitative ("this could be an issue")Weighted scoring (Risk × Probability)

Where ECE is Overhead

ECE is not a replacement for everyday AI planning. For most day-to-day decisions, standard AI is faster and sufficient. ECE's value scales with the cost of getting the decision wrong.

Problem TypeUse ECE?Why
Simple CRUD featureNoHappy path is obvious. Edge cases are minor.
UI layout decisionProbably notMultiple valid approaches, but consequences are cosmetic.
Authentication architectureYesSecurity edge cases are non-obvious. Consequences are severe.
Accessibility for disabled usersYesSafety implications. Missing edge cases cause harm.
Payment processingYesFinancial edge cases compound. Retry storms, race conditions.
Multi-service architectureYesCascading failures. Single-point analysis misses cross-system risks.

Minimum Viable Workflow

You do not need all 17 commands for every problem. The minimum captures 80% of the value in 10 to 15 minutes:

/ece map~5 min
/ece expand~5 min (1-2x)
/ece optimize~5 min

The full workflow (with blindspot, synthesize, review, close) is reserved for high-stakes evaluations where thoroughness justifies the 20 to 40 minute investment.

Blindspot Audit on This Evaluation

I ran the blindspot check on my own self-evaluation. It flagged real issues:

!Creator Bias: This evaluation was conducted by ECE’s creator. There is inherent anchoring toward proving ECE works. Mitigated by expanding counter-argument nodes and rating them honestly.
!Single Test Case: The strongest evidence (blind user navigation test) is a single evaluation. Confidence in node A1 was downgraded from High to Medium until more evaluations are documented.
!User-Preference Contamination: The question "does it actually create any upper hand?" implies an expectation of a positive answer, which could unconsciously weight findings.
The self-evaluation demonstrates what ECE is designed for: forcing an honest, structured analysis that includes checking your own bias. Even when evaluating your own tool.
Structure

Repository Layout

.ece/ SYSTEM.md # Core identity, rules, command registry (~65 lines) commands/ # 17 command files map.md expand.md trace.md recommend.md validate.md blindspot.md optimize.md synthesize.md test-gen.md impact.md review.md close.md resume.md status.md wireframe.md patterns.md reset.md templates/ spine.template.md # Standardized evaluation map format node.template.md # Standardized edge case format evaluations/ # Runtime data (one directory per problem) <problem-slug>/ spine.md nodes/ solution.md reviews/ history/ # Completed evaluation summaries
Proof of Concept

/ece wireframe in Action

The wireframe below was generated from a real ECE evaluation on the Indoor Navigator project — specifically the Blind Navigation Assistance Button problem. No states were invented. Every toggle condition, acceptance criterion, and edge state derives directly from the solution.md produced by /ece synthesize.

ECE Wireframe output — Blind Navigation Assistance Button

The full session recording below shows the complete flow — running /ece wireframe, toggling conditions (touch geometry, network status, battery level, onboarding state), and watching the state inspector update in real time.

View on GitHub
Conclusion

The Verdict

1ECE outperforms standard AI planning for complex, high-stakes problems. It maps 12+ edge cases where standard AI identifies 1 to 3, detects bias drift that standard AI does not self-detect, and replaces gut-feel recommendations with quantified scoring.
2ECE is overhead for simple problems. If the best approach is obvious and the edge cases are minor, standard AI is faster and sufficient. ECE's value scales with the cost of getting the decision wrong.
3ECE fills a gap that planning frameworks do not address. Planning tools decide what to build. ECE evaluates which approach to take before building begins. They are complementary, not competing.
4Bias detection is not optional. It is structural. The blindspot check found real drift in every evaluation tested, including this self-evaluation. Risk ratings shifted. Missing perspectives were identified. Creator bias was detected and flagged.

What Remains Unproven

- Independent validation: no one outside the project has run ECE on their own unfamiliar problem yet.
- Broader evaluation dataset: only 2 fully worked examples are documented (blind user navigation, payment gateway).
- Multi-model consistency: tested on one AI model. Results on other models are unverified.
- Team adoption patterns: ECE has only been used by a solo developer so far.
AI is capable of deep analysis. It just defaults to shallow answers unless you build the structure that demands depth. ECE is that structure. Its upper hand is real, but conditional: it emerges for complex problems and disappears for simple ones.

ECE is open source at github.com/rohitnischal01/ece. 17 commands, MIT license, installs into any AI coding environment with npx ece-evaluator ece-install.

More Writing
Evaluating AI-generated code governed by Markdown Skills
SKILLS.MDANTIGRAVITYDESIGN TRANSFER
Feb 12, 2026