This article evaluates a real experiment: the Design Genome Pipeline, a system that uses Markdown-based instruction files called “Skills” to govern AI-generated code — not just for visual polish, but for building production-ready products.
The experiment tested the same Invoices Dashboard in two ways:
“Build me an invoices page” — no rules, no constraints.
The same request, but with Skills loaded into context (the AI's working memory).
The test ran on a lower-intelligence, accessible model — deliberately. If Skills make a weaker model produce deployable code, they'll work even better on stronger ones. We tested the floor, not the ceiling.
Every current AI code tool can generate a good-looking UI. The problem isn't appearance — it's everything else:
Standard AI prompting handles approximately one of these (appearance). Skills aim to handle all six.
Skills are structured Markdown files that are loaded into an AI model's context window (its working memory for the current session). They act as runtime instructions — not permanent training, not database lookups, but literal rules the AI reads before generating code.
Think of Skills as an employee handbook for AI. Instead of hoping the AI “figures it out,” you hand it a rulebook. If a rule doesn't cover something, the AI stops and asks — instead of improvising.
| Skill | Role | Lines | Tokens |
|---|---|---|---|
| filtering-design-systems | The Gatekeeper — Controls what components can exist | 186 | ~2,800 |
| governing-accessibility | The Contrast Police — WCAG compliance | 97 | ~1,500 |
| governing-layouts | The Structural Engineer — Prevents layout drift | 147 | ~1,200 |
| governing-responsiveness | The Adaptive Engine — Breakpoint transformations | 143 | ~1,400 |
| Total Skill Overhead | 573 | ~6,900 |
Note on 1st Gen Skills: These original Skills were exploratory — intentionally exhaustive. For later projects, these were optimized to consume 40-60% fewer tokens by removing redundancy and merging related rules. On frontier models like Claude Opus, even leaner Skills can be used.
Skills are not a design tool. They are a production engineering framework. The same pattern — structured rules injected into AI context — works across the entire product lifecycle:
| Frontend Skill | Rule |
|---|---|
| Responsiveness Skills | Explicit rules for how each component adapts at each breakpoint. |
| Accessibility Skills | Contrast ratios, keyboard navigation, screen reader compatibility. |
| Layout Lock Skills | Prevents structural drift over long editing sessions. |
| Component Registry | The AI can only use components that actually exist in the codebase. |
| Backend Skill | Rule |
|---|---|
| API Contract Skills | Define exact endpoint shapes, preventing hallucinated endpoints. |
| State Management Skills | Enforce where data lives (server vs. client) and how it’s cached. |
| Authentication Skills | Lock down auth patterns (JWT, OAuth) to prevent insecure shortcuts. |
| Database Query Skills | Constrain query patterns to prevent N+1 problems. |
| Testing Skill | Rule |
|---|---|
| Test Coverage Skills | Every new component must include tests for all interaction states. |
| Regression Prevention | Before modifying any component, verify existing tests still pass. |
| Restoration Skills | When rolling back, remove only added code, never modify pre-existing logic. |
| Error Boundary Skills | Exactly how errors should be caught, logged, and reported. |
| Performance Skill | Rule |
|---|---|
| Bundle Size Governance | No component import may exceed 50KB gzipped. If too large, use dynamic import. |
| Image Optimization | All images must use optimized formats (WebP). No unoptimized raw images allowed. |
| Core Web Vitals | No layout shift above 0.1. Page must be interactive within 200ms. |
| Code Splitting | Route-level splitting mandatory — each page only loads its own code. |
| Font Loading | Use font-display: swap. Maximum 2 font families per page. |
| Caching | Static assets must use content-hash caching for permanent browser storage. |
| Deployment Skill | Rule |
|---|---|
| Pre-Deployment Skills | Run linting, type checking, and tests before any build. If any fails, halt. |
| Environment Skills | Never hardcode environment variables. All configs must reference .env files. |
| Migration Skills | Database schema changes must be reversible. Every up migration needs a down. |
| Monitoring Skills | Every deployed endpoint must include health checks. Error rates above 1% trigger automated rollback. |
A governed lower model produces reliably deployable code. An ungoverned higher model produces brilliant but unpredictable code that requires human review of every line. Skills make the difference.
Here's what most people miss: Skills don't just constrain the AI — they constrain the human developer too. And that's a feature, not a limitation.
A junior developer who has never handled WCAG accessibility, database Row-Level Security, or Core Web Vitals optimization gets those standards enforced automatically through Skills — without needing to study them first. The governing-accessibility Skill doesn't just tell the AI to follow WCAG — it teaches the developer what matters and why.
Skills reference a constellation of supporting documents that together form the pipeline:
The AI doesn't design layouts from scratch; it assembles them from Titan's pre-defined structural components that I created using figma mcp and antigravity to code.
An Invoices Dashboard — a realistic SaaS screen with sidebar navigation, page-level actions, data filters, a data table, status indicators, and pagination. This type of screen combines layout, data display, user controls, and state management.
The failed experiment was not incompetent. The AI genuinely tried to build a responsive, production-ready page:
The model followed governing-responsiveness literally. It didn't guess. It followed instructions.
Interact with both dashboards live — resize your browser to see governance in action.
Tokens (roughly 4 characters each) are the currency of AI interaction. Skills consume tokens from the context window.
| Resource | Tokens | % of 128K |
|---|---|---|
| 4 Governance Skills | ~6,900 | 5.4% |
| Contracts (9 files) | ~3,500 | 2.7% |
| Registry + Policies | ~4,200 | 3.3% |
| Pipeline Governance | ~800 | 0.6% |
| Total Pipeline Overhead | ~15,400 | 12.0% |
| Version | Tokens | Context | Notes |
|---|---|---|---|
| 1st Generation | ~15,400 | 12.0% | Verbose, exploratory |
| Optimized (2nd Gen) | ~8,000–9,000 | ~7% | Merged rules, shorthand |
| Frontier-Optimized (3rd Gen) | ~4,000–5,000 | ~4% | Minimal rules for strong models |
| Metric | Without Skills | With Skills |
|---|---|---|
| Responsive behavior | Attempted but fragile | Rule-based and stable |
| Layout stability | Degrades visibly | Consistent |
| Component hallucination | Present in long sessions | Eliminated |
| Accessibility | Inconsistent | Rules-enforced |
| Correction rounds | 3-4 (50K+ tokens each) | 0-1 |
| Net token cost | Higher (corrections) | Lower (upfront) |
The Design Genome Pipeline was a controlled test. Digihive answers the real question: do Skills survive weeks of development on a real, deployed product?
Digihive was built across hundreds of prompting rounds over weeks. By prompt 50, a standard AI starts “forgetting” how the sidebar works. By prompt 100, it reinvents the deletion pipeline. By prompt 150, spacing values drift. Skills fix this by encoding critical decisions outside the conversation history.
The conversation is ephemeral; the Skills are permanent. This is the real power of Skills: they are the product's institutional memory, not the AI's.
| Skill | Lines | Purpose |
|---|---|---|
| error-handling-patterns | 642 | Circuit breakers, retry with backoff, graceful degradation |
| performance-optimization | 218 | CDN strategy, bundle budgets, Core Web Vitals targets |
| react-best-practices | 70 | Waterfall elimination, bundle optimization, re-render prevention |
Notice the pattern: Skills get leaner with practice. The react-best-practices Skill is only 70 lines — proof that Skills can be compact and still effective.
| Dimension | Design Genome | Digihive |
|---|---|---|
| Scope | 1 screen | Full application |
| Duration | Single session (~2h) | Weeks of development |
| Backend | None (static UI) | Supabase: Auth, DB, RLS, Sync |
| Files | ~5 | 70+ |
| Production Skills | 4 (UI governance) | 7 (UI + backend + perf) |
| Deployment | Local preview | Live at digihive.space |
~9,300 tokens upfront vs. 200,000+ tokens in corrections over a long project. The larger the project, the higher the return on the Skill investment.
Skills are a pattern, not a product. The same structure works anywhere AI generates code:
| Area | Example Skill | Impact |
|---|---|---|
| SEO | Unique titles, meta descriptions, single h1, structured data | Prevents SEO-blind pages |
| Security | No plain text passwords. All input sanitized. Parameterized SQL. | Prevents common vulnerabilities |
| Internationalization | No hardcoded strings. All text references translation files. | Ensures translatable app |
| Analytics | Every user action triggers a tracking event with defined properties. | Consistent usage data |
| Documentation | Every exported function includes JSDoc with types and examples. | Docs as a byproduct of dev |
The Design Genome Pipeline was the experiment. Digihive is the proof. Together, they prove that Skills are not a research curiosity but a production engineering framework that works at scale.
Products built and governed using this workflow:
This evaluation was conducted as an unbiased analysis. The author acknowledges that the experiment was designed by the same team that built the pipeline. The failed experiment's genuine responsiveness attempts are documented fairly. Digihive is referenced as a deployed product that used the same governance methodology. An independent replication with adversarial test selection would strengthen these findings.