Rick

Rick
Rick

Thursday, March 19, 2026

Agent-Memory - The key to salient episodic memory for AI Agents

 

Agent-Memory - The key to salient episodic memory for AI Agents

Unlocking Persistent Context: Solving AI Agent Amnesia with Advanced Episodic Memory Systems

Tired of your AI coding assistant forgetting crucial details after every session? Discover how a Rust-powered memory system can solve the amnesia problem and provide your AI with the persistent context it needs to boost your productivity! Dive into the future of AI with our latest article on Agent-Memory! #AIMemory #CodingAssistant

Summary: A Rust-powered, append-only, episodic, salient conversational memory system addresses the amnesia problem in AI coding assistants by providing episodic memory, allowing agents to recall past conversations and decisions. It distinguishes between library memory (static documents) and episodic memory (dynamic conversation history), utilizing a six-layer cognitive stack for efficient retrieval. Salience detection prioritizes important memories, while an append-only design ensures raw events are retained indefinitely. The system supports multi-agent memory sharing through various strategies, enhancing collaborative coding experiences. It also support index eviction to keep the most recent and most salient memories in the index without blowing up you RAM budget. 

Getting Started with Agent Brain: From Zero to Semantic Search in 15 Minutes



Getting Started with Agent Brain: From Zero to Semantic Search in 15 Minutes

Mastering Local Semantic Search with Agent Brain: A Step-by-Step Guide 

Beyond the AI Coding Hangover: How Harness Engineering Prevents the Next Outage

 

Beyond the AI Coding Hangover: How Harness Engineering Prevents the Next Outage

Navigating the Challenges of AI Code Generation: Implementing Harness Engineering to Ensure Reliability and Safety

Is your AI coding tool a double-edged sword? Discover how the latest engineering strategies can prevent outages and safeguard your projects from the chaos of unchecked AI-generated code. Don’t let the AI coding hangover catch you off guard! Read more to learn how to build a robust harness for your software development.

It’s 2am. Your phone starts screaming. You grab it, see the ops alert, and your stomach drops. The post-mortem will say “erroneous software code deployment,” but you already know the real story: an AI agent did something nobody tested, nobody caught, and now you’re in a bridge call with six engineers trying to figure out why orders are failing across the entire ecommerce platform. That scenario is not hypothetical. It happened at Amazon in March 2026. The company that builds the AI coding tools had its own orders disrupted by code its own engineers shipped with AI assistance. If it happened to them, it can happen to you.

David Linthicum put a name to what many of us have been quietly living through. His March 2026 InfoWorld piece, “The AI Coding Hangover”, landed because it said out loud what senior engineers have been saying in Slack DMs for months: we moved fast with AI, we shipped a lot of code, and now the operational debt is coming due.

The AI coding hangover is real. But “AI is bad” is the wrong diagnosis, and “write better prompts” is the wrong cure. Here’s what’s actually broken; here’s what you can do about it starting Monday morning.

AI Coding Hangover — Table of Contents

  • The Promise Was Real. So Is the Hangover.
  • The Number That Should Keep You Up at Night
  • You’ve Been Optimizing the Wrong Layer
  • The Three Layers of a Working Harness Engineering System
  • Open-Source Tools That Build the Harness
  • Build the Harness, Not Just the Prompt
  • References

AI Coding Key Takeaways

  • The AI coding hangover is real: 42% of production code is now AI-generated, and defect rates are 1.7x higher per AI pull request than human-written code.
  • The verification gap is the core problem: 96% of developers distrust AI-generated code, yet only 48% always verify it before committing. This is verification debt.
  • Context engineering alone is not enough: Optimizing prompts and RAG pipelines addresses single-inference quality but cannot prevent systemic failures across time.
  • Harness engineering is the missing layer: Architectural constraints and feedback loops (above context engineering called harness engineering) are what keep AI-generated code production-safe.
  • Amazon proved the need: The world’s most sophisticated cloud engineering org added mandatory sign-off gates after a pattern of high-blast-radius AI-assisted incidents.
  • Three open-source tools map to the three layers: Agent Brain (context), Agent RuleZ (constraints), Agent Skills (feedback) — each MIT-licensed and production-ready.
Press enter or click to view image in full size
Shift, Risk and Harness Engineering for AI code development

AI Coding: The Promise Was Real. So Is the Hangover.

Let’s be honest about the excitement before we talk about the crash.

AI coding tools are genuinely transformative. Between 82% and 91% of developers now use AI tools weekly. GitHub Copilot crossed 20 million users. According to Sonar’s 2026 State of Code Developer Survey, 42% of all committed production code is now AI-generated, and developers expect that to reach 65% by 2027. That’s not a trend. That’s a structural shift in how software gets built.

Press enter or click to view image in full size
The AI Hangover

The productivity gains are real. Tasks that took days genuinely take hours now. Junior engineers produce at senior rates for narrow, well-defined problems. Documentation that never got written gets written. The tools work.

The problem was not the tools. The problem was the story executives told themselves about what the tools could replace.

“Software will be free.” That framing was everywhere in 2023 and 2024. The assumption underneath it: writing code is software engineering. If AI writes code, you can shrink engineering teams. If AI writes code fast, you can ship everything fast. If code is free, complexity is free.

“Software will be free.” That framing was everywhere in 2023 and 2024. The assumption underneath it: writing code is software engineering.

Linthicum calls this the fundamental misunderstanding. Code typing is a small fraction of what software engineering actually is. The hard parts (requirements definition, data quality, security modeling, performance engineering, operational design, architecture trade-offs) do not get solved by an LLM generating functions faster. When you remove human engineering judgment from those decisions, the cost does not disappear. It defers.

And deferred cost compounds.

Press enter or click to view image in full size
The promise versus the hangover

The AI coding hangover did not arrive suddenly. It built across four years as adoption outpaced governance, culminating in high-profile production incidents and an industry-wide recognition that writing code faster is not the same as engineering software better.

CodeRabbit’s December 2025 “State of AI vs Human Code Generation Report” gives us the numbers. Across 470 open-source GitHub pull requests, AI-generated code produced 1.7 times more issues per PR than human-written code: 10.83 issues per AI PR versus 6.45 for human PRs. Major issues are 1.7 times higher. Readability problems are 3.15 times higher. Security issues are 1.56 times higher.

AI doesn’t eliminate complexity. It generates complexity faster.

Linthicum observes what many of us are seeing firsthand: companies that bet on AI as a replacement are quietly rehiring. Platform engineers, specifically, are being brought back to untangle the sprawling systems that grew without architecture reviews or operational planning. What Linthicum calls “mini-enterprises” (teams that started with small coding tasks, then generated modules, then entire services, with no governance and no system intent) are now operational problems requiring human engineers to fix.

The hangover is expensive. And it’s going to get worse before the industry builds the right response.

Press enter or click to view image in full size
There is hope, you can escape the AI Coding Hangover by adopting Harness Enginnering

AI Coding: The Number That Should Keep You Up at Night

Here is the statistic I keep coming back to.

According to Sonar’s 2026 survey, 96% of developers distrust AI-generated code’s functional correctness. Nearly every developer working with AI tools today. Only 3% say they highly trust the output. Nine out of ten people using these tools actively doubt what the tools produce.

Now here’s the number that keeps me up at night: only 48% of those same developers always verify AI-generated code before committing.

Press enter or click to view image in full size
The number that should keep you up at night about AI generated code

Let that sit for a moment. We don’t trust it. And roughly half of us ship it anyway.

💡 The Verification Paradox: 96% of developers distrust AI-generated code. Only 48% always verify it. We know it’s untrustworthy, and we ship it anyway.

Sonar’s 2026 survey reveals a 48-point gap between awareness and action. Ninety-six percent of developers distrust AI-generated code for functional correctness, yet only 48 percent always verify it before committing. This gap, the Verification Debt Zone, is not a skills problem. It’s a structural failure in how teams have integrated AI into their engineering workflow.

Verification Debt

Werner Vogels, AWS CTO, described this structural gap at re:Invent 2025 with a term that has stuck: “verification debt.” He put it precisely: “When you write code yourself, comprehension comes with the act of creation. When the machine writes it, you’ll have to rebuild that comprehension during review. That’s what’s called verification debt.” (Note: Kevin Browne first coined the term in December 2024; Vogels gave it industry-wide visibility at re:Invent.)

Press enter or click to view image in full size
Bridging the AI Verification Gap

Security: Syntactically Correct Does not mean Semantically Secure

The security dimension of this gap is where the consequences become unavoidable. Veracode’s 2025 GenAI Code Security Report, testing code from over 100 large language models, found that 45% of AI-generated code samples failed OWASP Top 10 security tests. By language: Java failed at 72%, C# at 45%, JavaScript at 43%, Python at 38%. Syntactically correct code is not the same as semantically secure code. AI produces the former reliably. The latter requires something the model cannot provide on its own.

Press enter or click to view image in full size
Syntactically Correct Does not mean Semantically Secure

The Amazon incidents are the most concrete evidence we have that verification debt has real blast radius.

In December 2025, an approximately 13-hour outage hit AWS’s Cost Explorer in the Mainland China region. Reports describe engineers using Amazon’s Kiro AI coding tool; the agent deleted and recreated the entire environment. Amazon attributed the outage to a misconfigured IAM role and disputed AI as the root cause. The safeguards they put in place after the incident are more revealing than the attribution.

In March 2026, a roughly 6-hour outage disrupted Amazon’s main ecommerce platform: orders, account details, product pages. Amazon described it as an “erroneous software code deployment” and acknowledged a “trend of incidents” with high blast radius from AI-assisted changes. They disputed direct AI causation again. But they mandated that even senior engineers get code signed off before deployment — a governance control added specifically in response to the pattern of incidents.

Here’s what I take from that: when the world’s most sophisticated cloud engineering organization (the company that builds the AI coding tools) responds to an operational pattern by adding mandatory review gates, that is not a political decision. That is an engineering organization recognizing it needed a harness.

💡 The AWS Data Point: Amazon added mandatory senior engineer sign-off for AI-assisted code deployments after acknowledging a “trend of incidents” with high blast radius. The company that builds the AI tools concluded it needed harness controls. That matters.

Move from AI Coding to AI Engineering: You’ve Been Optimizing the Wrong Layer: Context Engineering Is Not Enough

So what has the industry done in response to the quality and security problems? Predictable: better prompts, better RAG, better context. A whole field called context engineering has grown up around improving what the model sees at inference time.

Context engineering is real and valuable. Let me define it precisely so I can explain why it isn’t enough.

Press enter or click to view image in full size
Context Engineering is not enough

Context engineering is the design and management of everything the LLM sees during a single inference: the system prompt, the tool definitions, the RAG results pulled from your codebase, the message history, the output schemas, the memory from prior sessions. The question it answers is: “What do we show the agent?”

Good context engineering reduces hallucinations. It produces better-structured output. It makes a single inference more likely to succeed. These are real improvements.

But context engineering has a fundamental ceiling.

Every new inference starts fresh. The model has no memory of its own failures. If it makes a mistake today and you fix the context tomorrow, nothing prevents the exact same mistake next Thursday when the context shifts again. Context engineering optimizes the single inference, not the system across time.

The 96/48 verification gap is proof that context engineering has not solved the quality problem. We have had years to refine our prompts, improve our RAG pipelines, tune our system instructions. And still, 96% of developers don’t trust what comes out.

Agent Harness Engineering

What’s missing is the layer above context: harness engineering.

Birgitta Boeckeler, writing on Martin Fowler’s site in February 2026, gave us the formal definition. She describes harness engineering as having three components:

  1. Context Engineering: the continuously enhanced knowledge base in the codebase, plus dynamic context from observability data
  2. Architectural Constraints: monitored not only by LLM-based agents, but also by deterministic custom linters and structural tests
  3. Garbage Collection: agents that run periodically to find inconsistencies in documentation or violations of architectural constraints

Read that first component carefully. Context engineering is a subset of harness engineering. Not a parallel discipline. Not a competitor. A subset.

Press enter or click to view image in full size
Three layers of harness engineering

Context engineering (prompts, RAG, tool definitions, memory, and output schemas) is the innermost layer of the harness, not the whole thing. Boeckeler’s three-component model adds Architectural Constraints (deterministic linters, structural tests, security gates) and Feedback Loops / Garbage Collection (review gates, CI/CD hooks, drift detection) as required outer layers. Optimizing only the inner layer while leaving the outer layers unbuilt is why the verification gap persists.

Mitchell Hashimoto said it most practically. In his essay “My AI Adoption Journey” on mitchellh.com, he wrote:

💡 “I’m making an earnest effort whenever I see an agent do a Bad Thing to prevent it from ever doing that bad thing again.”

That single sentence is the philosophy of harness engineering. Not “let me fix this prompt.” Not “let me improve the context.” Let me engineer a solution such that this class of failure cannot recur. When Hashimoto built this approach into his Ghostty project, he kept an AGENTS.md file where each line corresponds to one specific bad behavior he corrected. “Each line in that file is based on a bad agent behavior, and it almost completely resolved them all.”

Context engineering asks: “What do we show the agent?”

Harness engineering asks: “What do we prevent, measure, control, and fix?”

Different questions. Different engineering work. Right now, most teams are only doing the first one.

The model is the CPU. The harness is the OS. You’ve been tuning the CPU without building the OS.

Press enter or click to view image in full size
The AI Harness is the aspirin to the AI Coding Headache

The Three Layers of a Working Harness Engineering System for AI Coding Agents

What does a working harness actually look like in practice? Three layers, each addressing a different failure mode. Together, they change AI coding from “code generation I can’t fully trust” to “engineering automation I can put in production.”

Without a harness, AI-assisted development becomes a loop: code ships fast, incidents follow, hotfixes get pushed, and the same failure recurs next sprint. The harness model interrupts that loop with gates before production and a feedback mechanism that converts each failure into a permanent improvement. Amazon’s mandatory senior engineer sign-off is the review gate, implemented by hand. The harness automates it.

Harness Engineering Layer 1: Context Engineering Done Right — making AI Coding into Agents AI Coding Engineers

This is where most teams already spend their time, and it matters. Layer 1 means the agent knows your system. Not just “here’s a system prompt”; the agent knows your actual architecture, your failure history, and which services own which data.

Not just a CLAUDE.md file (though that’s a start). The agent needs semantic indexing of your codebase so it can retrieve relevant context at inference time. It needs persisted architectural decisions it can access across sessions. It needs that context kept current as the codebase evolves.

Spec driven development is a real factor here. Check out BSD, BMAD, SpecKit, Superpowers, and their ilk.

When context is done right, the agent stops reinventing what already exists, stops violating patterns that are already established, and stops making category errors about which services own which data. Those three failure modes alone account for a huge share of the operational debt Linthicum is describing.

Harness Engineering Layer 2: Architectural Constraints for AI Coding Agents

Layer 2 means the agent cannot break certain invariants even if it tries. This is where context engineering alone fundamentally fails.

A well-prompted agent might make a reasonable mistake anyway. A system with architectural constraints stops that mistake before it ships. Deterministic constraints don’t negotiate: type checking passes or it doesn’t. Security linters flag specific vulnerability patterns or they don’t. Structural tests catch architecture violations. Think of it like a circuit breaker for your AI agent; it doesn’t matter how good the agent is; the breaker trips on specific conditions, period.

This layer directly addresses the Veracode finding. If 45% of AI-generated code fails OWASP Top 10 tests, and you run OWASP-aligned security checks in your CI pipeline before any AI-generated code merges, the exposure drops dramatically. The constraint layer doesn’t trust the model to get security right. It verifies independently.

Amazon’s mandatory sign-off policy is a manual implementation of Layer 2. A senior engineer reviewing AI-generated changes before deployment is a constraint gate. Expensive, slow, and exactly what you need when you don’t have the automated equivalent. The harness automates what Amazon is doing by hand.

Press enter or click to view image in full size
Move Beyond Context Engineering and towards Harness Engineering

Harness Engineering Layer 3: Feedback Loops and Garbage Collection for AI Coding Agents

Layer 3 is the harness’s immune system. The system learns from failures and prevents their recurrence. In my experience, this is the layer most teams skip; and it’s the one that turns the harness from a one-time investment into a compounding asset.

This is Boeckeler’s “garbage collection” component. Agents that run periodically to find drift, inconsistencies, and violations. CI/CD hooks that capture failure patterns. Review gates that feed information back into the context layer. Every incident involving AI-assisted code becomes a rule in the constraint layer, a memory in the context layer, or a pattern in the workflow layer.

Amazon’s post-incident response is Layer 3 at the organizational level. They had incidents, ran post-mortems, added controls. The harness does this systematically and continuously rather than reactively and manually.

Mitchell Hashimoto’s AGENTS.md is Layer 3 in its simplest form: every bad behavior becomes a documented constraint. One line per incident. Each line is an improvement to the system.

Together, the three layers create a system that degrades gracefully when the model fails, learns from those failures, and gets progressively more reliable over time. Without the harness, each AI failure is an incident. With the harness, each AI failure is an improvement.

Open-Source Tools That Build the Harness

Frameworks are useful. Concrete tools you can install Monday morning are better. Adopt Spec Driven Development with GSD, BMAD, SpecKit, OpenSpec, Superpowers, etc.

Three open-source tools from Spillwave Solutions map directly to the harness layers. Each one is MIT-licensed, actively maintained, and solves a specific documented problem.

Harness engineering solution stack showing Agent Brain for cross-project context, Claude Code AI Agent, Agent RuleZ for policy enforcement, and Agent Skills for validated workflow patterns with feedback loops connecting all layers.

Press enter or click to view image in full size
Implementing the Harness in Code, Not Just Policy

The three open-source tools implement the three harness layers in code rather than policy. Agent Brain provides cross-project context and architectural memory. Agent RuleZ enforces deterministic security and policy constraints at sub-10ms latency. Agent Skills provides repeatable, validated workflow patterns. The feedback loop at the bottom converts each production incident into a permanent harness improvement; the system learns where context engineering alone cannot. We are also working on Agent Memory which is more about monitoring the conversation for salient information and the indexing it and making it available so you coding agent or agent does not forget; alas Agent Memory is still in development.

Press enter or click to view image in full size
Harness engineering solution stack showing Agent Brain for cross-project context, Claude Code AI Agent, Agent RuleZ for policy enforcement, and Agent Skills for validated workflow patterns with feedback loops connecting all layers

Agent Brain: Harness Engineering Layer 1 at Scale

Agent Brain is a private RAG system built as a Claude Code plugin. Its purpose is simple and specific: give AI agents accurate, retrievable knowledge of your actual system.

The problem it solves: Linthicum describes “mini-enterprises” — fragmented systems built by teams operating without shared context, no architecture reviews, no system intent. Agent Brain solves this by creating a continuously updated semantic index of your codebase that persists across sessions and spans multiple projects. It also does this for you documents: design, PRD, specs., governance, etc.

What it does:

  • Semantic indexing with multi-mode retrieval: hybrid, semantic/vector, BM25 keyword, and knowledge graph modes
  • AST-aware code chunking that understands code structure across 10 programming languages
  • Cross-project context so the agent knows how services relate to each other, not just what’s in the current file
  • Architectural memory that persists decisions from one session to the next

When your agent knows that the order service owns payment state, and the inventory service should never write to the payment database, it stops making a whole class of architectural mistakes. That knowledge lives in the harness, not in the prompt you happened to write today.

Repo details: MIT license, 61 stars, 435 commits as of March 2026. Actively maintained.

Agent RuleZ: Harness Engineering Layer 2 Deterministically

Agent RuleZ is described as “a deterministic, auditable, local-first AI policy engine for Claude Code.” It is the architectural constraint layer made operational.

The problem it solves: The Veracode finding is stark — 45% of AI-generated code fails OWASP Top 10 tests. Layer 2 without a tool means relying on CI pipelines that run after code is written, reviewed, and ready to merge. Agent RuleZ intercepts earlier, at the agent event level, before bad code is generated and committed.

What it does:

  • Intercepts Claude Code events at sub-10ms latency
  • Evaluates policy rules expressed in YAML configuration (not raw JSON, which matters for maintainability)
  • Blocks risky operations deterministically: no model negotiation, no prompt override
  • Produces an auditable policy trail so you know exactly what rules fired and when

Built in Rust for the performance characteristics that make sub-10ms evaluation possible at the tool-call level. A policy that says “never delete a production database without an explicit confirmation step” executes in microseconds and cannot be talked out of it by a clever prompt.

This is the technical equivalent of Amazon’s mandatory sign-off, automated and running at every inference rather than at merge time.

Repo details: MIT license, agent_rulez (with underscore) on GitHub. Rust-based.

Here is a sample Agent RuleZ YAML policy with two rules. Read it top to bottom — each rule is a named policy object with three required fields:

rules:
- name: prevent-production-deletion
event: tool_call # WHEN: fires on any tool invocation
condition:
tool: bash # IF: the tool being called is bash
pattern: "DROP TABLE|DELETE FROM|rm -rf" # AND: the command matches these destructive patterns
environment: production # AND: the current environment is production
action: block # THEN: block the operation entirely
message: "Destructive operations in production require explicit confirmation. Add --confirm flag."
- name: require-owasp-scan
event: file_write # WHEN: fires whenever the agent writes a file
condition:
file_extension: [".java", ".js", ".py"] # IF: the file is a source file in these languages
action: require_check # THEN: gate on a passing check result
check: owasp_linter # SPECIFICALLY: run the OWASP linter before allowing the write

Walk through what each field does:

  • name: A human-readable label. This shows up in audit logs, so name it like a policy document.
  • event: The Claude Code lifecycle hook where the rule fires. tool_call intercepts before execution. file_write intercepts before the file lands on disk. The rule fires early -- before the bad thing happens.
  • condition: The filter criteria. All conditions must match for the rule to fire. The pattern field uses regex, so you can match multiple destructive commands in one rule.
  • action: What happens when the condition matches. block stops the operation entirely. require_check gates on an external validator passing. There is no "warn" action that the model can acknowledge and continue -- that path exists, but block means block.
  • message: What the developer sees when the rule fires. This is the teachable moment. Write it like a colleague explaining why the guardrail exists.

Now here’s a second example showing something different: encoding an architectural decision, not just a security policy.

rules:
- name: enforce-service-ownership
event: file_write # WHEN: fires on any file write
condition:
file_path: "services/inventory/**" # IF: writing inside the inventory service
content_pattern: "payment_db|orders_schema" # AND: the content references payment or orders data
action: block
message: "Inventory service must not access payment or
orders schemas directly.
Use the PaymentService API. See ADR-012."

- name: require-migration-on-schema-change
event: tool_call # WHEN: a tool is called
condition:
tool: bash
pattern: "ALTER TABLE|CREATE TABLE|DROP COLUMN" # IF: it's a schema mutation
action: require_check
check: migration_exists # THEN: verify a migration file exists for this change
message: "Schema changes require a versioned migration file. Run: make migration-create NAME=<description>"

These two rules encode decisions from architecture reviews as machine-enforceable constraints. The first rule encodes the service ownership boundary: the inventory service does not touch payment data, full stop. The second encodes the team’s migration discipline: schema changes without migration files do not ship. Neither of these rules is a prompt. Neither can be overridden by rephrasing the request. They are not suggestions to the model; they are constraints on the model’s actions.

Press enter or click to view image in full size
Agent RuleZ can block actions and prevent problems. Unlike Markdown rules which are strong suggestions or CLAUDE.md or AGENTS.md, Agent RuleZ takes action.

This is Hashimoto’s AGENTS.md philosophy implemented at the enforcement layer, not just the documentation layer. Each rule corresponds to one real failure mode. The comment in the message field is the architecture review note, made executable.

💡 Why Rules Beat Post-Commit CI for Architecture: A CI check runs after the code is written, reviewed, and staged for merge. By then, the developer has context-switched away, the PR is open, and fixing it is friction. Agent RuleZ fires at the moment of generation — when the agent writes the file, before the developer even sees it. The cost of enforcement drops from “fix a PR review comment” to “the agent gets it right on the first pass.”

Agent Skills: Harness Engineering Layer 3 Patterns

Agent Skills is the Claude Code plugin system for repeatable, validated workflow patterns. Where Agent Brain gives the agent knowledge and Agent RuleZ enforces constraints, Agent Skills ensures the agent follows consistent execution sequences.

The problem it solves: Inconsistency across teams and between sessions. Left to its own devices, an AI agent invents a slightly different deployment sequence every time. Different engineers get different results. Lessons learned in one session don’t carry to the next. Agent Skills packages validated workflows into reusable patterns that execute consistently.

What it does:

  • Encapsulates multi-step workflows as executable skill sequences
  • Validates each step before proceeding to the next
  • Carries institutional knowledge about how tasks should be executed in your specific environment
  • Prevents the common failure mode of agents “helpfully” taking shortcuts that skip important validation steps

This is Layer 3’s garbage collection component made actionable: instead of cleaning up after failures, Skills prevents the failures by providing the right execution path in the first place.

Press enter or click to view image in full size

Together, the three tools cover the full harness:

Press enter or click to view image in full size

The harness is not a single tool. It’s a layered system. These three tools build that system.

Build the Harness, Not Just the Prompt

The AI coding hangover is real. Linthicum named it. The data confirms it. Amazon lived it publicly enough that we all have a case study with timestamps.

This is not an indictment of AI tools. It’s an engineering diagnosis: powerful technology deployed without the engineering infrastructure it requires. We have been here before. Every powerful production technology needed a harness; databases needed transactions and connection pools, distributed systems needed circuit breakers and retries, cloud infrastructure needed IaC and drift detection. AI coding tools need a harness too. The pattern is as old as production software.

AWS’s response proves the point from the most credible possible direction. The company that builds the AI tools, employs some of the world’s best engineers, and runs the infrastructure that half the internet depends on — that company added mandatory sign-off requirements and governance controls in response to a pattern of high-blast-radius incidents involving AI-assisted changes. Not because AI is bad. Because powerful tools without harness controls produce unacceptable operational risk.

Press enter or click to view image in full size
Building the Harness for Production Safe Code

If AWS needed the harness, so do you.

Here’s where to start. These four steps build on each other — each one makes the next more effective.

This week: Create an AGENTS.md file. Document every bad AI behavior you have personally seen. One line per behavior, one line per incident. Don’t try to make it comprehensive. Write down what actually happened in your system. This is Hashimoto’s minimum viable harness. Not sophisticated — and it works. This step costs nothing and takes an hour. It also forces you to look honestly at what has gone wrong, which is the prerequisite for everything that follows.

This month: Add Agent RuleZ with one security rule. Start with the class of failure most likely to cause a production incident in your specific environment. A rule that prevents destructive database operations without confirmation. A rule that blocks deployment to production without a tagged version. One rule, enforced deterministically. The AGENTS.md you wrote last week tells you exactly which rule to write first — it’s the incident that cost the most hours. These rules do not just inform they take action and block.

This quarter: Set up Agent Brain for cross-project context. Index your core services. Index your document repos. Give your agents architectural memory. Measure the reduction in category errors: the wrong service touching the wrong data, the reinvented utility that duplicates an existing one, the architectural decision that contradicts what was decided last quarter. The rules you added with Agent RuleZ will generate data about which architectural boundaries get crossed most often; that’s your indexing priority list.

Ongoing: Every production incident involving AI-assisted code is a harness improvement opportunity. What rule would have caught this? What context would have prevented it? What workflow pattern would have taken the shortcut off the table? Post-mortem the harness, not just the code. Without this step, the first three steps are static. With it, the harness compounds; each incident makes the system more reliable.

Press enter or click to view image in full size
Build the Harnesss, Not just the Prompts

The model is the CPU. The harness is the OS. You wouldn’t run a production system on bare metal. Don’t run your AI coding pipeline that way either.

Press enter or click to view image in full size
Move from AI Coding Hangover to Engineering Rigor

References

Press enter or click to view image in full size

About the Author

Rick Hightower is a technology executive and data engineer who led ML/AI development at a Fortune 100 financial services company. He created skilz, the universal agent skill installer, supporting 30+ coding agents including Claude Code, Gemini, Copilot, and Cursor, and co-founded the world’s largest agentic skill marketplace. Connect with Rick Hightower on LinkedIn or Medium.

Rick has been actively developing generative AI systems, agents, and agentic workflows for years. He is the author of numerous agentic frameworks and developer tools and brings deep practical expertise to teams looking to adopt AI.

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training