Sleepless Dev: Mastering Agent Development: The Architect Agent Workflow for Creating Robust AI Agent

How frustration with upper environment debugging led to a year-long evolution in multi-agent AI workflows

Architect figure at holographic command center orchestrating multiple AI coding agents across floating workspace screens with dramatic lighting

The Problem: Lost in the Fog of War

If you’ve ever actually used an AI coding assistant, you know the deal. One minute, they’re a genius. The next, they’re just dangerously unpredictable.

Sometimes it’s just a feeling of powerlessness.

I remember debugging a upper level environment GCP issue, watching an AI coding assistant modify scripts while I held my breath, hoping it didn’t torch everything. You see it make a change you didn’t ask for and then try to follow the logic. Oh, it found something that I missed quickly becomes what it is doing now. Should I stop it or let it finish?

I struggled to ingest it all, to get the playback needed for the human-in-the-loop that often catches mistakes before they become disasters. Great, it fixed it. What did it fix? I now need to make the same fix across different environments and backfill it in a few IaC repos. It works, but how? And is the change it made appropriate for our security posture, and are the other IaC repos now out of date?

Ok, let me ask: what did it do? How did you fix that last problem? It responds, “What last problem?”, because the context memory has been cycled a few times. What you need is an accurate log of what was done and why. And you need the ability to course correct.

This tool was built from that exact feeling of frustration. Through this journey, I discovered how to more closely manage and track what a coding assistant, nee’s coding agent, should do. And created something truly effective for modifying code when vibe coding is not appropriate.

💡 The Core Problem: AI coding assistants move fast, sometimes too fast. When dealing with production systems or regulated systems where changes can be catastrophic, you need visibility and observability. You need control. You need a second set of eyes.

Split comparison showing chaotic fog of war on left versus organized architect oversight on right with clear visibility and control

And that’s the heart of the issue. These AI tools are incredibly powerful, for sure. But without real oversight, you’re just “vibe coding” your way through a production system. You might get code that works, but you totally lose that professional discipline. Maybe even worse, you have no idea why a change was made. The audit trail is just gone. I do not advocate letting a coding agent lose on a production system.

Getting that extra level of oversight goes beyond vibe coding. This is about building robust agent skills that support existing systems performing important production tasks. When you have multiple repositories, services, and microservices that require coordination, a change in one often necessitates changes in others.

Coordinating these changes across several repos and their builds requires a level of thought that goes beyond being in-the-dirt with a coding agent. You need to step back and have a broader quarterback for your coding agents, ensuring they coordinate correctly and that one isn’t blocking while the others run.

The Architect Agent was born out of frustration and need. It became my framework pattern for using coding assistants and coding agents.

What If We Could Wrap a System Around the AI?

So what’s the solution here? How do we stop just crossing our fingers and hoping for the best?

This project poses a pretty simple question: What if we could wrap a whole system around the AI? Something that acts like a clear blueprint, a project manager, and a quality inspector, all rolled into one, keeping a human firmly in the driver’s seat.

Meet the Architect Agent. It’s a framework designed from the ground up to put a professional, human-centric structure on AI development. The goal is to turn that AI from a loose cannon into a real, dependable collaborator.

The Construction Site Analogy

The easiest way to wrap your head around this is to think about a construction site.

The Architect Agent is your building architect. They’re the one with the detailed blueprints, defining all the materials and standards. They don’t swing hammers or pour concrete. They plan, review, and ensure that everything complies with code. They work with the building inspectors and then with the crews to make sure everything is up to snuff.

The Code Agent is the construction crew on the ground. They don’t design anything. They just follow those blueprints to the letter, executing the actual work with precision. They are smart, able to work around problems, and given latitude. But when they have to deviate from the plan, the architect with the broader view can come in, review their work against the overall plan, and ensure it all fits.

Two totally different jobs, but they’re working together to deliver a single high-quality result. The architect never compromises on standards. The construction crew never freelances on the design.

A Framework for Agent Skill Development: The Plan, Delegate, Grade Workflow

At its core, the Architect Agent follows a simple yet powerful workflow that underpins effective agent development. The whole process boils down to this really elegant, powerful loop:

Plan > Delegate > Grade > Iterate > Learn

Plan: Create detailed, structured instructions for code agents
Delegate: Send instructions to code agents for implementation
Grade: Evaluate completed work against objective rubrics (target: 95% or higher)
Iterate: Guide improvements until quality threshold met
Learn: Update code agent memory with successful patterns

Architect Agent workflow diagram showing Plan, Delegate, Grade, Iterate, Learn cycle with Code Agent execution flow

Here’s the kicker: If that grade is anything less than 95%, the whole loop starts over. The architect gives new feedback, and the code agent tries again until it hits that quality bar. No exceptions.

What makes this agent development framework different? Real-time monitoring with human interjection points.

Even before code agents finish, I can ask the Architect Agent: What’s the coding agent up to? Is it on the right track?

Sometimes it’s not, and I can stop it and redirect instead of waiting until the end. When running multiple coding agents simultaneously, this additional oversight proves invaluable for developing reliable agent skills.

Also, I might have many of these running concurrently. One might be taking a lot longer than I expected. I can ask the architect agent: “Hey, the ingestion pipeline coding agent seems to be very busy on what should have been a simple task. Can you go check on him and see if we need to redirect?” It happens more often than you think. The architect agent becomes a second set of yes.

Building Human-in-the-Loop Agent Skills

💡

Key Insight: The Architect Agent isn’t about replacing human judgment. It’s about amplifying it. You get the speed of AI with the oversight of a human.

I find it essential to put the human back in the loop, especially with production systems where you must be certain about what changes are being made, or when coordinating across multiple repositories. This extra level of due diligence is critical for successful agent development.

That was the spirit of the Architect Agent: preventing you from getting lost in the coding agent’s output and providing a second set of eyes to both guide and grade the work.

The human is always in the loop. You get to see the plan before any code is written. You can catch mistakes super early. You’re not waiting until the end to find out the AI went completely off the rails.

And really, what it all comes down to is this: You take that mysterious AI black box and turn it into a transparent partner. You get the control back.

The Dual Instruction Philosophy for Agent Skills

Here’s an innovation that emerged from practical agent development: every instruction has two versions.

When the architect agent creates instructions for the coding agent, it also creates instructions for humans. This means you can:

Follow along in real-time
Opt to do the task yourself if you prefer
Request a summary: “Give me a 25-point bullet list of what’s actually happening right now”

You decide when and how to interject. Even if you’re not going to execute manually, the instructions are laid out so you could.

Here’s what the human instructions look like in Notion, structured so a human can execute manually when needed (I use a Notion agent skill that I wrote to upload markdown to notion):

Notion workspace showing human instructions organized with architecture documents, specifications, and manual execution guides. The above is my dashboard for a feature that I am implementing.

The Notion workspace is my way of keeping myself in the loop with the human instructions organized with architecture documents, specifications, and manual execution guides. I am often reading and planning while the code agent is humming along implementing. The above is my dashboard for a feature that I am implementing.

Phase 2 Manual Execution Guide showing step-by-step commands, expected outputs, and logging session instructions. The above is a detailed instruction on how to implement a task in a broader plan.

The “Phase 2 Manual Execution Guide” shows me the step-by-step commands, expected outputs, and logging session instructions. The above is a detailed instruction on how to implement a task in a broader plan. I keep all of the plans in notion or confluence and follow along, edit, change direction, redirect, etc. as needed. I am in the loop.

This keeps the human-in-the-loop. Yes, it can slow things down, but it makes the outcome more predictable. When dealing with production systems, predictability beats speed. I can have this wonderful planning sessions with approved plans, design docs, and specifications. Then during the implementation phase, it is not uncommon to catch something. Then I go back to the architect agent and say, you know we agreed on XYZ, but I noticed the instructions we send say ABC, and now that I see it, this should be MNLOP.

The Guardrails: Non-Negotiable Professional Standards

How does this system actually enforce such a high standard? It’s not just about that loop. The whole thing is built on a set of guardrails. These are non-negotiable, professional rules baked right into the process.

Mandatory Git Workflow

For all the developers reading this, you’re gonna love this. The AI is forced to use a proper professional Git workflow:

No committing directly to the main branch. Ever.
Every single change needs its own branch, its own issue, and a formal pull request
This isn’t a suggestion. It’s mandatory.

It guarantees you have a full audit trail and a human reviewer for every single change. The days of mysterious commits appearing in production with no explanation are over.

Does the code agent forget. Yes. It does happen. This is where the grading feedback loop comes in.

The 100-Point Rubric

Remember that grade report concept? It’s not made up on the fly. It’s all based on a comprehensive 100-point rubric that covers everything you’d expect in a professional environment:

The 100-point rubric evaluates code agent work across six critical dimensions:

Completeness (25 points): Measures whether all requirements and success criteria were fully met. This is the largest category, reflecting that delivering what was asked for is paramount.
Code Quality (20 points): Evaluates correctness, maintainability, clarity, and adherence to best practices. High-quality code should be easy to understand, modify, and extend.
Testing & Verification (20 points): Assesses whether automated tests run and pass, coverage meets or exceeds 60%, and all actions are properly verified. This ensures reliability and catches regressions early.
Documentation (15 points): Examines the quality of logs, change documentation, READMEs, and inline comments. Good documentation makes code accessible to future developers and aids maintenance.
Resilience & Adaptability (10 points): Measures the ability to recover from errors, handle edge cases, and apply robust workarounds. Resilient code gracefully handles unexpected situations.
Logging & Traceability (10 points): Evaluates real-time, structured logs with timestamps and clear decision points. Good logging makes debugging and auditing straightforward.

Target: 95 points or higher for successful completion.

Automatic Grade Caps: Forcing Quality from the Start

This is a brilliant feature to hammer home best practices. The system has automatic grade caps:

Unit tests not run: Maximum grade capped at D (65%)
Unit tests fail: Maximum grade capped at F (50%) — UNACCEPTABLE
Test coverage below 60%: Maximum grade capped at C- (70%)

Get this: If the code agent turns in code with zero unit tests, it literally doesn’t matter how amazing the rest of it is. The absolute maximum score it can get is 65%. A D. That’s a fail.

It basically forces the AI to build quality in from the start.

Here’s what a real grade summary looks like from a recent project:

Grade Summary showing Phase 1 Ingestion CLI with A- (92/100) score, detailed category breakdown including File Structure, CLI Commands, and Pydantic Models

This isn’t some vague “Hey, good job” pat on the back. It’s a real, itemized report card. You can see exactly where the code agent nailed it, like getting a perfect score on the file structure, but oops, it lost a couple of points because it forgot to implement a — force flag. It was flagged for insufficient logging.

You can then take those grades and make improvements to the code agent’s instructions or spec. Using agent skills like project memory and improving the instructions and knowledge of the instructions. This way, it is about continuous improvement. Use the grade to improve outcomes and avoid similar mistakes in the future.

All of that adds up to the final score: 92 out of 100. That’s an A-minus, which is pretty good, but it’s not 95. Based on those notes, the architect generates new instructions to fix those problems and sends the code agent back to work for another round.

SKILZ-77 Complete Summary table showing all phases with grades from A- to A, average grade A- (9.43/10)

VS Code workspace showing Phase 4 grading complete with 10/10 scores across Instruction Compliance, Code Quality, Testing, and Protocol Compliance

Our Evolution in Agent Development: From Prompts to Reusable Agent Skills

The Architect Agent has evolved significantly over time, mirroring the broader evolution in agent development practices.

It started as a project in Claude Desktop where I copied markdown files into Claude Code Desktop Projects (we had a tool to make this easier). These became instruction files: markdown documents I wrote in Notion, exported, and used as initial prompts to build out the Architect Agent. The we added file system MCP so Claude Desktop could access the code agent repos direct.

Eventually, it the Architect Agent became an agentic skill. I got tired of copying prompts around every time I needed a new instance. This is a common pattern in agent skill development: start manual, then automate.

This project is probably a year old (at least 1 year, probably closer to two). I’ve been using it for over a year, but it has evolved during that time.

💡

The key realization: Agent skills didn’t exist a year ago. Claude Code wasn’t where it is now. The coding assistants weren’t where they are now. But this idea of a project as an architect with coding assistance has been around for a while, and it evolved as the needs evolved.

Passive Logging: Essential for Agent Skill Development (And Your Wallet)

One of the biggest improvements in my agent development journey came from rethinking how logging and auditing works.

Before, logging was active: the LLM generated logs manually. This meant the coding agent had to:

Take a few steps
Stop and write about it
Take more steps
Stop and write about it

Trying to manually log every single thing the AI does would burn through a ton of expensive AI tokens. So instead, the system now uses clever, automated hooks that just log when a tool is used. Auditing AI Agents with hooks is more deterministic, and it saves tokens.

With hooks and plugins, logging became passive. No more slowdowns. No more constant reminders.

The result? A 60 to 70% reduction in token costs just for logging. That’s real money and time saved.

Automated logging hooks configured for Claude Code and OpenCode, showing session started with all tool usage captured automatically

We configure automated logging hooks for Claude Code and plugins for OpenCode.

💡

My Preferred Setup for Agent Development: — Architect Agent: Claude Code (consistent access to Opus 4.5) — Coding Agent: OpenCode (plugins handle logging without consuming tokens). With OpenCode I use Claude Code Sonnet 4.5 through Github CoPilot login, Grok Code Fast and Gemini 3 Pro & Flash.

💡

Logging and monitoring occur in the background. That’s a significant advantage for developing robust agent skills.

Since OpenCode does not support hooks but it does support plugins so I was able to replicate most of the passive audit logging with OpenCode plugins.

Case Study: A Multi-Phase Agent Skill Development Project in Action

Let me walk you through a complete cycle using actual screenshots from a recent project. This shows the full Plan > Delegate > Grade > Iterate flow in practice.

How to Develop Custom Agent Skills: A Step-by-Step Breakdown

Step 1: Instructions Created and Sent

The architect agent creates detailed technical instructions and sends them to the code agent. Notice the 15-point summary: clear, actionable, measurable.

Architect agent sending instructions to skill-scanner code agent with Phase 2 summary including objectives, new submissions, verification steps, and success criteria. It even offers to send me a copy, and I usually have a dashboard in Notion where I am monitoring 1 to two architect agents managing 4 to five coding agents.

The Architect agent sending instructions to skill-scanner code agent (skill-scanner is a project I am working on) with Phase 2 summary including objectives, new submissions, verification steps, and success criteria. It even offers to send me a copy to my notion, and I usually have a dashboard in Notion where I am monitoring 1 to two architect agents managing 4 to five coding agents.

Step 2: Human Instructions Published

Simultaneously, human instructions are created and uploaded to Notion. Every action tracked with a status checkbox.

Human instructions uploaded to Notion showing completed actions: Phase 2 instruction file created, human summary created, instructions sent, uploaded to Notion

The above shows how the Human instructions were uploaded to Notion showing completed actions: Phase 2 instruction file created, human summary created, instructions sent, and the instructions were uploaded to Notion where I can review them (and sometimes edit and provide feedback).

Step 3: Code Agent Executes (with Monitoring)

The code agent implements the work while logging progress. At any point, I can check in, and sometimes issues are caught mid-execution.

Code agent detecting two issues during execution: private repos being scanned when they shouldn’t be, and repos without skills being added to database

Here the Code agent was detecting two issues during execution: private repos being scanned when they shouldn’t be, and repos without skills being added to database. In reality, that was me who saw and complained about the private repo being scanned but since it ended up in the logs of the code agent, the architect agent saw it as the code agent. I am often checking and redirecting coding agents. This allows me to redirect at the plan / spec level to.

Step 4: Bug Detection and Fix Instructions

When issues are found, the architect agent creates targeted fix instructions with specific code changes.

Detailed bug fix instructions showing exact code for private repo filtering using isPrivate flag and empty repo check before database insertion

This shows how the Architect Agent and I redirected the coding agent and the detailed bug fix instructions showing exact code for private repo filtering using isPrivate flag and empty repo check before database insertion.

Step 5: Implementation and Iteration

The code agent implements the improvements. Notice the structured response with numbered improvements.

Code agent implementing force flag logic, enhanced session logging following Hybrid Logging v2.0 protocol, and refined database integration

Here the Code agent gets the new update instructions and the Code agent implementing force flag logic, enhanced session logging following Hybrid Logging v2.0 protocol, and refined database integration as requested.

Step 6: PR Created, Success Criteria Met

Finally, the work is complete with a PR created and all success criteria verified.

Phase 4 implementation complete showing two-tier analysis model operational with success criteria verification table and PR link

My go to with code agents are OpenCode running Grok 4 Code Fast, but if it is visual, then I use OpenCode with Gemini 3 Flash (screen shots). I can use Codex, Gemini or Claude Code for Coding agents or Architect Agents but Claude Code and OpenCode work best due to their support for lifecycle logging via hooks and plugins.

Multi-Agent Systems: Coordinating Agent Skills Across Repositories

This is where the “quarterback” metaphor becomes concrete.

When you have multiple repositories, services, and microservices requiring coordination, a change in one often necessitates changes in others. The architect agent maintains the big picture. This is essential for managing complex multi-agent systems.

Multi-agent coordination diagram showing Human Operator at top providing oversight to Architect Agent quarterback, who coordinates three Code Agents working on Backend, Frontend, and Infrastructure

This approach enables multi-agent coordination while more readily allow for human in the loop interaction and allow for the Human Operator at top providing oversight to Architect Agent quarterback, who coordinates three Code Agents working on Backend, Frontend, and Infrastructure all at the same time.

This AI agent framework also helps with parallel development. You get the big picture while making changes across several repos with different coding agents. One agent isn’t blocking while the others run.

I also tend to break up my projects into smaller pieces because giant monoliths are hard for coding agents to wrap their heads around. This divide and conquer approach seems to work well with the current breed of tools.

The Big Payoff: From Black Box to Transparent Partner

So we’ve got this really robust system, right? Planning, grading, quality control. But what’s the big payoff here? Why does all this extra structure matter so much?

Here’s why this is a game changer:

Predictable Quality: You’re swapping unpredictable “hope for the best” outcomes for a predictable, quality-first process
Early Mistake Detection: The human is always in the loop. You see the plan before any code is written, catching mistakes super early
Full Audit Trail: Every change has a branch, an issue, a PR, and a grade report. Nothing happens in the shadows. No more having this conversation with the coding agent “How did you fix that?” followed by “Fix what?” “The thing with the IAM policies.” “I don’t know what you’re talking about.”
Cost Efficiency: Passive logging cuts token costs by 60–70%
Cross-Platform Flexibility: Works with Claude Code, OpenCode, Gemini, Codex, and 10+ other agents (but mostly Claude Code and OpenCode)

What it all comes down to is this: You take that mysterious AI black box and turn it into a transparent partner. You get the control back.

Best Practices for Agent Development: Speed vs Control

Let’s be honest about the trade-offs in agent skill development.

There are different levels of control, and some depend on how much time I have. Sometimes, when coding with coding agents, I also apply agent skills from spec-driven development. The more spec and planning I put in, the slower it moves forward, but the more precise the results afterward.

💡

The paradox: The more controlled something is, the less human interaction it needs. I’m a limited resource, so that’s a good thing. But the more controls you implement, the slower the process.

Sometimes you have to go slow to go fast. If that makes sense, you know what I’m talking about. — Rick Hightower

Installation: Your First Step in Agent Skill Development

Ready to try it? Here’s how to get started with your own agent development workflow.

Using skilz CLI (Recommended)

The easiest way to install agent skills across any AI coding assistant is to use skilz agent skill installer:

pip install skilz
# Claude Code (user-level, available in all projects)
skilz install <https://github.com/SpillwaveSolutions/architect-agent>
# Claude Code (project-level)
skilz install <https://github.com/SpillwaveSolutions/architect-agent> --project
# OpenCode
skilz install -g <https://github.com/SpillwaveSolutions/architect-agent> --agent opencode
# Gemini, Codex, and 14+ other agents supported
skilz install <https://github.com/SpillwaveSolutions/architect-agent> --agent gemini

Quick Start Commands for Agent Development

Once installed, trigger the skill with natural language:

“write instructions for code agent”: Create detailed technical instructions
“initialize architect workspace”: Set up workspace structure
“grade the code agent’s work”: Evaluate completed work against rubric
“send instructions to code agent”: Copy instructions for execution

Automated Workspace Setup

For the fastest start, use the templates:

cd ~/.claude/skills/architect-agent/templates/

# Create code agent workspace
./setup-workspace.sh code-agent ~/projects/my-code-agent
# Create architect workspace
./setup-workspace.sh architect ~/projects/my-architect \
    --code-agent-path ~/projects/my-code-agent

Complete setup in less than 5 minutes. You can just ask the skill to set up the code agent or the architect agent folder, and it will. It has all of the scripts to send up code agent folders and architect agent folders. It even install the right hook or plugin for logging and observability.

The Future of Agent Skill Development: From Skills to Plugins

The next big thing, and I don’t know when this will happen because I have a day job and a bunch of other projects going on, would be to make this into an Agent plugin.

Most of this is background activity, not my main focus. I have a set of commands, hooks, and OpenCode plugins. I’d like to take this skill and turn it into a full-blown Claude Code plugin.

But once I do that, it won’t work as well with OpenCode, which it does today. There’s the aspect of evolution when you’re using multiple coding assistants. I always use Codex and Gemini. It would be nice to support them all, and right now the best way to do that is via agent skills, since unlike plugins, agent skills are a standard.

A Question for the Future

Maybe the future of AI and development isn’t just about making the AI smarter and letting it run wild. Maybe it’s about us getting smarter about how we manage it. Building these kinds of sophisticated frameworks around it.

Is this how we build a future? By making AI a true partner, governed by our standards, instead of just a powerful but ultimately unreliable tool?

Key Takeaways for Agent Development

Born from frustration: Real problems drive real solutions. The fog of war with production debugging demanded better oversight.
Human-in-the-loop is essential: Especially for production systems. You need certainty about what changes are being made.
Objective grading enables iteration: Not just “done,” but measurably good. Target 95%+ quality.
Guardrails matter: Mandatory Git workflow, automatic grade caps, and the 100-point rubric force professional standards.
Passive logging beats manual logging: Hooks and plugins capture everything without slowing you down, cutting costs by 60–70%.
Cross-platform agent skills future-proof your workflows: Works with Claude Code, OpenCode, Gemini, Codex, and 10+ other agents.
Go slow to go fast: The more spec and planning, the slower the start but the more precise the results.

Resources for Agent Skill Development

GitHub: SpillwaveSolutions/architect-agent
SkillzWave Marketplace: architect-agent listing
Universal Installer: pip install skilz

The coding and architecture agents evolved alongside the underlying systems as I found time to improve them. I do other things, so this is only as evolved as I need it to be for daily use. It’s a combination I use a lot because it allows me to get involved.

Tags: #AgentSkills #AgentDevelopment #ClaudeCode #AIAssistant #MultiAgentWorkflows #SoftwareArchitecture #DeveloperProductivity #HumanInTheLoop

About the Author

Rick Hightower is a technology executive and data engineer with extensive experience at a Fortune 100 financial services organization, where he led the development of advanced Machine Learning and AI solutions to optimize customer experience metrics. His expertise spans both theoretical AI frameworks and practical enterprise implementation.

Rick wrote the skilz universal agent skill installer that works with Gemini, Claude Code, Codex, OpenCode, Github Copilot CLI, Cursor, Aidr, Qwen Code, Kimi Code and about 14 other coding agents as well as the co-founder of the world’s largest agentic skill marketplace.

Connect with Rick Hightower on LinkedIn or Medium for insights on enterprise AI implementation and strategy.

Community Extensions & Resources

The Claude Code community has developed powerful extensions that enhance its capabilities. Here are some valuable resources from Spillwave Solutions (Spillwave Solutions Home Page):

Integration Skills

Notion Uploader/Downloader Agent Skill: Seamlessly upload and download Markdown content and images to Notion for documentation workflows
Confluence Agent Skill: Upload and download Markdown content and images to Confluence for enterprise documentation
JIRA Integration Agent Skill: Create and read JIRA tickets, including handling special required fields

Subscribe To

Rick

Sunday, January 18, 2026

Mastering Agent Development: The Architect Agent Workflow for Creating Robust AI Agent