Rick

Rick
Rick

Monday, January 12, 2026

Claude Agent Skills Conceptual Deep Dive

 

Progressive Disclosure Architecture (PDA) visualized as a futuristic AI librarian accessing three glowing tiers of knowledge representing metadata, instructions, and resources. Claude Skills uses PDA.

Claude Agent Skills Conceptual Deep Dive

Why teaching an AI with manuals beats giving it a list of functions: How Anthropic’s three-tier loading system achieves 90% token reduction via Agent Skills

Section 1: The Context Window Bottleneck

If you’ve built a production-grade AI agent, you’ve likely run headfirst into one of the most expensive and frustrating limitations in the field: the context window. It’s the AI’s short-term memory, and it’s shockingly small relative to our ambitions for these systems.

The problem appears immediately when you try to give your AI specialized knowledge. Imagine you want Claude to help manage your company’s Jira tickets, generate PlantUML diagrams according to your team’s standards, or process documents using your specific compliance rules. The naive approach — what I call the “documentation dump” — seems logical: just stuff all the relevant documentation into the system prompt.

Here’s what happens in practice.

When Good Intentions Go Wrong: A Real Failure Story

Picture this: It’s 2:00 AM. Your team’s AI diagramming assistant just crashed during a critical architecture review presentation. The CTO is waiting. The error message? “Context limit exceeded.”

You’d loaded PlantUML and Mermaid documentation with tons of examples and few shots. There goes 120,000 to 300,000 tokens; before the user even asked a single question. At current API pricing ($3 per million input tokens for Claude 3.5 Sonnet), you’re burning $0.36 to $0.90 per request on context alone. The agent hasn’t produced a single diagram yet.

The cost isn’t just financial. Loading that much upfront information creates catastrophic performance degradation:

  • Latency: 8–15 seconds of processing before the agent can even begin reasoning about the user’s actual request. Your users stare at loading spinners.
  • Dilution: The truly relevant information; perhaps 5–10% of what was loaded, is now drowned in a sea of unused reference material. The AI struggles to find the signal in the noise. More signal means more chance of using the wrong thing too.
  • Waste: You’re paying to load extensive documentation about sequence diagrams when the user just wants a simple class diagram. It’s like forcing someone to read an entire encyclopedia to answer a single question.

It all eventually boils down to context engineering, sure you can load up the context with a shot gun of what you think you might need, but it would be much better to surgically add the context you need.

💡 Key Takeaway: The documentation dump approach doesn’t just waste tokens — it actively degrades AI performance through information overload and latency.

The Math That Breaks Your Budget

Consider the math. If you’re building an enterprise AI assistant with capabilities for:

  • Document processing (PDF, DOCX, XLSX, PPTX)
  • Diagramming (PlantUML, Mermaid, D3.js)
  • Productivity integrations (Jira, Confluence, Notion, GitHub)
  • DevOps automation (Terraform, Pulumi, kubectl)

And, you try to load complete documentation for all of these upfront, you’re looking at potentially 500,000+ tokens before the conversation even starts. Most models have a 200,000 token context window. You’ve already exceeded capacity by 2.5x, and the user hasn’t asked for anything yet.

Real-World Impact:

  • Imagine a Fortune 500 company tried this approach for their internal AI assistant
  • Result: Could only support 3 to 4 capabilities before hitting context limits
  • Cost: $2.50 per conversation on average (mostly wasted on unused documentation)
  • Outcome: Project shelved after 3 months due to unsustainable costs

The Scaling Crisis

This is not a theoretical problem. According to research from Anthropic, teams attempting to build comprehensive AI agents were hitting context limits with just a handful of specialized capabilities. The traditional approach simply doesn’t scale. You can’t have 50 MCP servers loaded per request either. And MCP is method / function discovery and is not as good as specifying a process. Here are the 500 tools that JIRA supports not here are 4 things I need to use the JIRA CLI or API for this exact task that I am trying to do.

Here’s the brutal reality:

Press enter or click to view image in full size
  • 1–2 tools: 50,000–100,000 tokens | $0.15-$0.30 per request | 3–5s latency | ✅ Barely feasible
  • 3–5 tools: 150,000–250,000 tokens | $0.45-$0.75 per request | 8–12s latency | ⚠️ Expensive
  • 6–10 tools: 300,000–500,000 tokens | $0.90-$1.50 per request | 15–25s latency | ❌ Impossible
  • 10+ tools: 500,000+ tokens | $1.50+ per request | 30s+ latency | ❌ Not viable

The industry needed a fundamentally different architecture; one that could give an AI access to vast libraries of knowledge without drowning it in irrelevant information. The solution isn’t a bigger context window. It’s smarter loading.

It is not just Anthropic saying it either.

Industry Reports & Analyst Briefings

Enterprise Case Studies & Context Engineering Post-Mortems

Technical Guides & Explanations

These sources confirm, through industry analysis and practical case studies, that context management and context window engineering are major pain points for enterprise AI agent adoption and scaling. Of course, if you have experienced it first hand, you know. It is a problem.

Ok. We have identified the pain so what is the solution.

💡Note: the Claude Agent Skills are not just available via Claude Code or Claude Desktop. They are available in the new Agent Framework from Anthropic and they are making their way into other agentic coding agents and other agentic frameworks. PDA has a lot of applications and simplifies agent development. Many agentic frameworks had similar techniques for managing context, but this is a nice sort of standard way to handle it. It might be more important than MCP. Now it is not just Claude Code but also CodexGithub Copilot and OpenCode have all announced support for Agentic Skills. There is even a marketplace for agentic skills that support Gemini, Aidr, Qwen Code, Kimi K2 Code, Cursor (14+ and counting) and more with Agentic Skill Support via a universal installer.

Section 2: Enter Progressive Disclosure Architecture

Press enter or click to view image in full size
Comparison of traditional documentation dump approach consuming 300K tokens versus Progressive Disclosure Architecture using only 15K tokens showing 95% reduction. I know it says 90% but who is mathing. Are you mathing? I am mathing.
Comparison of traditional documentation dump approach consuming 300K tokens versus Progressive Disclosure Architecture using only 15K tokens showing 95% reduction. I know it says 90% but who is mathing. Are you mathing? I am mathing.

Anthropic’s answer to the context bottleneck is called Progressive Disclosure Architecture (PDA), and it represents a philosophical shift in how we think about AI knowledge management.

Instead of treating an AI like a computer program that needs all its libraries loaded into RAM before execution, PDA treats the AI like a new team member being onboarded with manuals and runbooks. You don’t hand a new employee every manual in the company library on day one. You give them a directory of what’s available, and they fetch the specific manual they need when they need it.

My Own Journey with Progressive Disclosure (Before It Had a Name)

Now, I have to admit something: I’ve used this PDA technique before on projects, long before I knew it was called Progressive Disclosure Architecture, or before Claude Skills even existed. I thought this technique was called “agentic RAG,” but Anthropic calls it PDA, and who am I to argue? It’s a form of context engineering, and giving something a good, distinct name is important. We must identify patterns to more readily explain them.

Here’s what happened in my experience:

I was working with large technical documents tied to a specific industry. Tons of technical jargon. And I wasn’t getting very good results with regular RAG (vector index-based similarity searches). Keyword searches weren’t helping either. Due to a time crunch training my own tokenizer was not in the cards. The problem was the highly specialized jargon and the way information was structured across these massive PDFs.

So here’s what I did: I had a list of questions I was trying to answer. Instead of dumping entire documents which were way to big to fit in Claude’s context window or Gemini’s context window for that matter into the context or relying on embeddings to find the right chunks, I would:

  1. Load the PDF TOC and have the LLM examine the table of contents
  2. Present my list of questions to the model
  3. Ask the LLM to decide which sections to load from the table of contents to best answer those questions
  4. Only then pull in the selected sections for detailed analysis
  5. Side Note: I also always loaded the first 10 or so pages because quite a few of the answers were on the coversheet and the introduction (about 1/2, so Pareto’s Law strikes again). I would actually use the answers I got from the first few pages to help find the rest as I loaded multiple sections at a time looking for answers.

This way I was able to quickly process a very large tome (often 16 MB to 32 MB in size) quickly.

With this technique, I got excellent results; enough to answer all the questions I needed and extract the data I required from these large technical documents with their very specific jargon. The LLM could reason about which knowledge it needed before loading it, rather than hoping a similarity search would find the right passage.

💡 The Connection to Claude Agent Skills: You can think of Claude Skills a lot like having the agent read that table of contents — where the table of contents is your skill’s metadata (Tier 1), and the chapters and sections are the resources and references bundled within your Claude Skill (Tiers 2 and 3), which we’ll get into later.

I’m adding this personal experience to the article to show my own expertise and understanding of context engineering, and to help explain how PDA works through a real-world lens. The pattern I discovered independently is exactly what Anthropic formalized with Progressive Disclosure Architecture. It’s about loading knowledge surgically, not with a shotgun. I have some other examples where I have done similar pattern matching and then loading of additional context, but I will leave that for another article.

The difference now? With Claude Agent Skills, this pattern is standardized, systematic, and built into the framework. What I had to manually orchestrate for each project is now a repeatable architecture that scales across hundreds of capabilities.

The Librarian Analogy

Think of Claude as a brilliant librarian with a small desk (the context window). The library itself is vast — it contains thousands of specialized knowledge packages called “Skills.” The librarian doesn’t pile every book on the desk at once. Instead, the library is organized with a simple three-step system:

Step 1: The Card Catalog (Metadata Layer) When starting work, the librarian quickly scans a tiny card catalog. Each card has just two things: a skill’s name and a brief description. This takes almost no desk space — about 100 tokens per skill. The librarian instantly knows what’s available across potentially hundreds of capabilities without reading a single full manual.

Step 2: Fetching the Manual (SKILL.md Layer) When a request comes in that matches a card in the catalog, the librarian walks to the shelf and retrieves only that one manual — the SKILL.md file. This contains detailed, step-by-step instructions for the task at hand. Typical cost: 1,000-5,000 tokens. The desk is occupied only by immediately relevant information.

Step 3: Using Specialized Tools (Bundled Resources Layer) Finally, the manual might direct the librarian to use tools bundled with it: Python scripts, reference documents, configuration templates. These are pulled from the skill’s directory only at the exact moment the instructions call for them. Because they’re accessed via filesystem reads rather than loaded into context, they represent “effectively unbounded” knowledge.

This elegant system solves the fundamental problem: how to give an AI access to vast knowledge without overwhelming its limited working memory.

💡 Key TakeawayProgressive Disclosure Architecture transforms the context window from a hard limit into a working memory — loading only what’s needed, when it’s needed.

Side Quest Exercise: Just to see if you are tracking

I wrote two skills: one for Mermaid and one for PlantUML. I also built a tool to help me debug and test how well Skills implement PDA. It’s a visual tool and still a work in progress.

Just by looking at the screenshots of my Mermaid Claude Skill and my PlantUML Claude Skill, which one do you think implements PDA better? Leave me a comment below.

Press enter or click to view image in full size
Screen Shot of PUML Skill that I wrote loaded up in my Skills Debugger Tool
Press enter or click to view image in full size
Screen Shot of PUML Skill that I wrote loaded up in my Skills Debugger Tool
Press enter or click to view image in full size
Screen Shot of Mermaid Skill that I wrote loaded up in my Skills Debugger Tool

The Philosophical Shift: Declarative vs. Imperative

What makes PDA fundamentally different from competing approaches (like OpenAI’s Tools) is its philosophical paradigm.

OpenAI Tools: The Imperative Approach (“Call This Function”) OpenAI’s model works like a traditional RPC (Remote Procedure Call) system. You define functions with JSON schemas — strict contracts specifying inputs, outputs, and types. The AI’s job is to analyze a request and output a perfectly formatted JSON object that calls one of these functions. It’s fast, predictable, and excellent for discrete API integrations.

Claude Skills: The Declarative Approach (“Here’s a Manual”) Claude Skills, by contrast, provide procedural knowledge in human-readable Markdown. Instead of “call function create_diagram(type: str, content: str)", a Skill says: "To create a diagram, first determine the type needed, then consult the syntax reference in references/syntax.md for that diagram type, then generate the output using the conversion script in scripts/."

The declarative approach trades some precision for enormous flexibility. It can handle ambiguous, multi-step workflows where the exact sequence of operations isn’t known until the AI reasons about the user’s request. It’s perfect for codifying complex organizational procedures that would be impossibly rigid to express as function schemas.

Why This Changes Everything

Progressive Disclosure achieves what seems impossible: an AI agent that’s simultaneously deeply specialized and highly efficient.

  • Specialized: It has access to comprehensive knowledge about hundreds of domains — diagramming, productivity tools, document processing, infrastructure automation.
  • Efficient: It only loads what it needs, when it needs it, keeping context consumption minimal.

According to Anthropic’s internal testing, PDA reduces context window bloat by up to 90% compared to documentation dump approaches. That’s not an incremental improvement. That’s a paradigm shift.

Real-World Impact:

  • Before PDA: Teams limited to 3–5 specialized capabilities
  • After PDA: Same teams deploying 20–30 capabilities in the same context budget
  • Cost Reduction: From $2.50 per conversation to $0.15-$0.30 per conversation
  • Performance: Latency reduced from 15–25 seconds to 2–4 seconds

In the next sections, we’ll see exactly how this works in practice with two real-world examples: Diagramming Skills and Productivity Skills.

In the next sections, we’ll see exactly how this works in practice with two real-world examples: Diagramming Skills and Productivity Skills.

Section 3: The Three-Tier Loading Mechanism

Press enter or click to view image in full size
Three-tier Progressive Disclosure Architecture showing Tier 1 metadata index, Tier 2 on-demand SKILL.md loading, and Tier 3 dynamic resource bundling with token costs labeled

Three-tier Progressive Disclosure Architecture showing Tier 1 metadata index, Tier 2 on-demand SKILL.md loading, and Tier 3 dynamic resource bundling with token costs labeled

Before diving into examples, let’s examine the technical architecture that makes Progressive Disclosure work. Understanding the three tiers is essential to appreciating why this approach achieves such dramatic efficiency gains.

At a Glance: The Three Tiers

Press enter or click to view image in full size
Tiers of PDA for Claude Skills
  • Tier 1: Metadata — YAML frontmatter with name and description loaded at session start for all skills (~100 tokens per skill) for skill discovery
  • Tier 2: Instructions — SKILL.md with procedural steps loaded when skill is selected (1,000–5,000 tokens) for task guidance
  • Tier 3: Resources — Scripts, docs, and templates loaded on-demand by Tier 2 (0 tokens via filesystem) for deep knowledge access
Press enter or click to view image in full size

Tier 1: System-Wide Metadata Index

At session initialization, Claude loads only the YAML frontmatter from each available skill’s SKILL.md file. This frontmatter contains:

name: "plantuml-diagrams"
description: "Creates PlantUML diagrams including sequence, class, component, and deployment diagrams. Converts to PNG/SVG formats."
version: "1.2.0"
allowed-tools: ["Bash", "Python"]

That’s it. No implementation details, no syntax guides, no scripts. Just enough information for Claude to understand:

  1. What the skill does
  2. When it might be relevant
  3. What permissions it requires

Token cost: ~100 tokens per skill Benefit: Claude can be aware of hundreds of capabilities while consuming minimal context

This creates what I call an “index of capabilities”; a lightweight awareness layer that enables intelligent skill selection without upfront loading costs. This is your table of contents for a large tome.

💡 Key Takeaway: Tier 1 is like a table of contents; it tells Claude what’s available without loading the actual content. This is the secret to supporting hundreds of skills simultaneously.

Tier 2: On-Demand SKILL.md Loading

When Claude determines a skill is relevant to the user’s request, it uses a tool (typically Bash with a file read operation) to load the full SKILL.md file into the conversation context.

This file contains:

  • Detailed procedural instructions (in Markdown)
  • Step-by-step workflows
  • Conditional logic (“If the user requests X, then do Y”)
  • References to Tier 3 resources
  • Security constraints and best practices

Example structure:

# PlantUML Diagram Generator

## Overview
This skill generates PlantUML diagrams from natural language descriptions.

## Instructions

1. **Determine Diagram Type**: Ask the user or infer from context (sequence, class, component, etc.)

2. **Consult Syntax Reference**:
- For sequence diagrams: Read `references/sequence_syntax.md`
- For class diagrams: Read `references/class_syntax.md`

3. **Generate Diagram Code**: Create valid PlantUML markup

4. **Convert to Image**: Execute `scripts/convert_puml.py` with the generated code

5. **Return Result**: Provide the diagram and offer to make adjustments

## Error Handling
If syntax validation fails, consult `references/common_errors.md` for troubleshooting.

Token cost: 1,000–5,000 tokens (depending on complexity) Benefit: Full procedural knowledge loaded just-in-time, only for relevant skills

Critically, the SKILL.md best practice (per Anthropic's guidelines) is to keep this file under 500 lines. Why? Because Tier 2 is where you pay the token cost. Everything else should be delegated to Tier 3.

Press enter or click to view image in full size
Skill debugger showing PlantUML Skills.md, notice it is 406 bytes and fits within the Anthropic Guidelines
Press enter or click to view image in full size
Skill debugger showing PlantUML Skills Triggers and Keywords
Press enter or click to view image in full size
Skills Debugger Showing PlantUML skill’s python scripts

Tier 3: Dynamic Resource Bundling

This is where Progressive Disclosure becomes truly powerful. Tier 3 resources are not loaded into the context window unless explicitly requested by the Tier 2 instructions.

These resources include:

  • Scripts: Python, Bash, or other executables (convert_puml.pyvalidate_syntax.py)
  • Reference Documentation: Comprehensive syntax guides, API documentation
  • Templates: Configuration files, boilerplate code
  • Data Files: CSVs, JSONs, or other structured data

Because these files are accessed via filesystem operations (reading a file, executing a script) rather than being loaded into the LLM’s context, they don’t count against the token limit. This is what makes the context “effectively unbounded.”

Token cost: 0 tokens (until content is read into context, which happens selectively) Benefit: Unlimited knowledge depth without context bloat

Example: A PlantUML skill might have an 80,000-token syntax reference in references/syntax.md. With PDA, that document is only loaded if:

  1. The user’s request requires it (Tier 2 instructions determine this)
  2. Error Handling: Rate limiting, network failures, validation errors
  3. Security: API key protection, data sanitization, permission scoping

If you tried to express this as a single OpenAI Tools function schema, you’d end up with a rigid, brittle integration that can’t handle the nuance of real-world usage. (“Should I create a new page or update an existing one? Should I preserve formatting or strip it? Should I notify team members?”). MCP is better because it has richer meta-data but Skills is a surgical scalpel in comparison.

Progressive Disclosure skills handle this by encoding procedural knowledge instead of function signatures.

Also note that you can describe a process step by step and let the LLM do it, but that uses a lot of tokens, if there is a lot of steps in a process, but they are repeatable just create a script and then only have the LLM take over if the script fails. I will talk more about this in the next two articles that I have planned.

The PDA Solution: Notion Uploader Skill

Press enter or click to view image in full size
The Notion Uploader / Downloader Tool Skill I wrote in the Skill Debugger that i wrote

Here’s the directory structure for a real Notion integration skill:

notion-uploader-downloader/
├── SKILL.md # Tier 2: Workflow orchestration (~3K tokens)
├── notion_upload.py # Tier 3: Python upload script
├── download_confluence.py # Tier 3: Download script
├── config/
│ └── api_templates.json # Tier 3: Configuration templates
└── references/
├── authentication.md # Tier 3: Auth patterns (~2K tokens)
├── block_types.md # Tier 3: Notion block reference (~8K tokens)
└── troubleshooting.md # Tier 3: Common errors (~3K tokens)

Walkthrough: Document Upload Workflow

Let’s trace a user request: “Download these Confluence pages and upload them to Notion with proper formatting”

This is a multi-step, multi-skill workflow that demonstrates PDA’s orchestration capabilities:

Press enter or click to view image in full size

Let’s break down each phase:

Phase 1: Skill Discovery (Tier 1)

  • Claude scans metadata: name: "notion-uploader", description: "Uploads documents and data to Notion workspace..."
  • Also finds: name: "confluence-downloader", description: "Downloads pages from Confluence..."
  • Determines both skills are needed for the workflow
  • Token cost: 200 tokens (2 skills × 100 tokens)

Phase 2: Instruction Loading (Tier 2)

  • Loads notion-uploader/SKILL.md
  • Key instructions found:
## Authentication

1. Check for API key in environment: `NOTION_API_KEY`
2. If not found, consult `references/authentication.md` for setup

## Document Upload

1. Transform content to Notion blocks (see `references/block_types.md`)
2. Execute `notion_upload.py` with authentication token
3. Handle errors per `references/troubleshooting.md`

- **Token cost**: 200 + 3,000 = 3,200 tokens

**Phase 3: Resource Loading (Tier 3)**

- Loads `notion_upload.py` (the actual upload script)
- Loads `references/authentication.md` (OAuth patterns)
- Does NOT load:
- `references/block_types.md` (not needed for simple uploads)
- `references/troubleshooting.md` (only loaded if errors occur)
- `download_confluence.py` (user didn't request download)
- **Token cost**: 3,200 + 5,000 (script) + 2,000 (auth ref) = 10,200 tokens

**Phase 4-6: Execution**

- Script executes (outside context—no additional token cost)
- API calls happen (external to Claude)
- Result returned to user

Before/After Comparison: Productivity Integration

Press enter or click to view image in full size
  • Documentation Loaded: Traditional Approach: 150,000 tokens | PDA Approach: 10,200 tokens | Savings: 93.2%
  • Cost Per Workflow: Traditional Approach: $0.45 | PDA Approach: $0.03 | Savings: 93.3%
  • Latency: Traditional Approach: 20–30 seconds | PDA Approach: 4–6 seconds | Savings: 80%
  • Error Recovery: Traditional Approach: Manual intervention | PDA Approach: Automatic (Tier 3 troubleshooting) | Savings: 10x faster
  • Capabilities: Traditional Approach: 1–2 integrations | PDA Approach: 10+ integrations | Savings: 5x more

Compare this to a traditional approach where you’d need to load:

  • Complete Notion API documentation: 50,000 tokens
  • Complete Confluence API documentation: 45,000 tokens
  • Authentication guides: 15,000 tokens
  • Block type references: 25,000 tokens
  • Error handling guides: 15,000 tokens
  • Total: 150,000 tokens

Even if you use MCP, the LLM has to discover which methods to use, and using MCP with Skills is more effecient. MCP is the tool box, and the Skill becomes the manual on how to use the tools, which saves time so the LLM does not have to figure it out.

PDA Savings: 93.2%

💡 Key Takeaway: For complex workflows, PDA’s savings compound; each additional step would require loading more documentation in traditional approaches, but PDA only loads what’s needed for each specific step.

Press enter or click to view image in full size
Common tasks have Python Scripts to perform said task direct with the Notion API

Security: The Two-Layer Defense Model

Productivity skills require elevated privileges (API access, script execution). PDA includes a two-layer security model:

Layer 1: Declarative (Skill-Level Intent)

The SKILL.md frontmatter includes an allowed-tools field:

name: "notion-uploader"
description: "Uploads documents to Notion workspace"
allowed-tools:
- "Bash(git:*)" # Only git commands allowed
- "Python" # Python execution permitted
- "FileSystem(read:/docs)" # Read access to /docs only

This declarative restriction prevents the skill from being abused via prompt injection. Even if a malicious prompt tries to trick Claude into executing rm -rf /, the skill's allowed-tools constraint blocks it.

Best Practice: Be as specific as possible. Instead of Bash, use Bash(git status:*) to allow only specific commands.

Layer 2: Imperative (Runtime Sandbox)

All skill execution happens inside a sandboxed container environment with hard kernel-level restrictions:

  • Read-only filesystem (except designated workspaces)
  • Network egress allowlist (only approved API endpoints)
  • Blocked dangerous commands (curl, wget, ssh, etc.)
  • Resource limits (CPU, memory, execution time)

This provides defense-in-depth. Even if the declarative layer is bypassed, the runtime sandbox provides a hard boundary.

State Management: Handling API Tokens and Sessions

Skills are stateless by design — they don’t persist information across sessions. But productivity integrations need state (API tokens, user preferences, session data).

PDA supports four state management patterns:

Pattern 1: Environment Variables

# In notion_upload.py
import os
api_key = os.getenv('NOTION_API_KEY')

Pattern 2: Filesystem Persistence

# Save state to ~/.config/notion/auth.json
with open(os.path.expanduser('~/.config/notion/auth.json'), 'w') as f:
json.dump({'token': token, 'expires': expires_at}, f)

Pattern 3: External MCP Servers

# Integrate with Model Context Protocol server for persistent storage
mcp-server: "notion-state-manager"

Pattern 4: User Confirmation

## Instructions
1. Ask user for API key if not in environment
2. Store in secure location for session duration
3. Clear on session end

The Notion skill uses Pattern 1 (environment variables) for security and Pattern 4 (user confirmation) for UX.

Real-World Impact: Case Study

Company: FinCompliance Corp (financial services compliance documentation)

Challenge: Automate compliance document workflows across multiple systems

Before PDA:

  • Built custom integration scripts for Confluence/Jira
  • Each script required separate maintenance
  • No reusability across different compliance tasks
  • Context consumption: ~200K tokens for all integrations
  • Cost: $0.60 per workflow execution
  • Development time: 2–3 weeks per new integration

After PDA:

  • Created 5 reusable productivity skills (Confluence, Jira, SharePoint, Notion, GitHub)
  • Skills compose for complex workflows (“Extract Jira tickets to Generate report to Upload to SharePoint”)
  • Average context consumption: 15K-25K tokens per workflow
  • Cost: $0.045-$0.075 per workflow execution (87.5% reduction)
  • Development time: 2–3 days per new skill

ROI Metrics:

  • Token Efficiency: 87.5% reduction in context usage
  • Cost Savings: $0.525 per workflow × 5,000 workflows/month = $2,625/month saved
  • Capability Expansion: Enabled 8x more concurrent capabilities
  • Development Velocity: 5x faster time-to-market for new integrations

Section 7: Enterprise Implications and Future Directions

As of November 2025, Claude Skills are in feature preview across Pro, Max, Team, and Enterprise plans. They’re available on Claude’s web interface, Claude Code, their Agent framework and API. But what does production readiness actually look like?

You can also use skills with OpenCode which is an open source competitor to Claude Code that works with 75 different LLMs and allows you to log into your Pro or Max plan as well as allows you to login into your Github CoPilot account, plus many, many others. I imagine most Agentic frameworks will support skills at some level. I know that I have started trying to integrate them with non Claude agentic systems already and feel like I will soon have several agentic solutions that use Claude Skills outside of the Anthropic ecosystem besides OpenCode. PDA is a compelling idea.

Antrhopic seems to lead the pack with innovative ideas like MCP and now Claude Skills.

Current State: Feature Preview

Availability:

  • ✅ Pro, Max, Team, Enterprise plans (not on free tier)
  • ✅ Web, desktop, and API access
  • ✅ Code execution must be enabled in settings
  • ⚠️ Custom skill creation more robust on Team/Enterprise

Ecosystem:

  • Growing community repository: github.com/anthropics/skills
  • Pre-built skills for common tasks (document processing, data analysis, integrations)
  • Active development of domain-specific skills (finance, healthcare, legal)

Known Limitations:

  • No centralized admin distribution for organizations (planned for future updates)
  • Custom skills require manual distribution or API deployment
  • Some advanced governance features still in development

The Hybrid Architecture: Skills + MCP

The most powerful approach combines Skills with Model Context Protocol (MCP) servers:

Skills = Methodology (“how to do things”)

  • Procedural knowledge
  • Workflow orchestration
  • Best practices and standards

MCP = Connectivity (“what things are available”)

  • Database access
  • API integrations
  • External tool connections

Example: Multi-System Workflow

User Request: "Create a deployment plan for our new microservice"

1. Skill: `deployment-planner` (orchestrates the workflow)
- Loads procedural knowledge about deployment best practices

2. MCP Server: `github-integration` (provides repository access)
- Skill instructs: "Fetch recent commits from main branch"

3. MCP Server: `kubernetes-cluster` (provides cluster state)
- Skill instructs: "Check current resource utilization"

4. Skill: `cost-estimator` (calculates infrastructure costs)
- Uses data from MCP servers

5. Skill: `approval-workflow` (requires human confirmation)
- Generates deployment plan
- Requests approval before execution

This separation of concerns (methodology vs. connectivity) creates modular, maintainable AI systems.

Security: The Two-Layer Defense Model

Productivity skills require elevated privileges (API access, script execution). PDA includes a two-layer security model:

Layer 1: Declarative (Skill-Level Intent)

The SKILL.md frontmatter includes an allowed-tools field:

name: "notion-uploader"
description: "Uploads documents to Notion workspace"
allowed-tools:
- "Bash(git:*)" # Only git commands allowed
- "Python" # Python execution permitted
- "FileSystem(read:/docs)" # Read access to /docs only

This declarative restriction prevents the skill from being abused via prompt injection. Even if a malicious prompt tries to trick Claude into executing rm -rf /, the skill's allowed-tools constraint blocks it.

Best Practice: Be as specific as possible. Instead of Bash, use Bash(git status:*) to allow only specific commands.

Layer 2: Imperative (Runtime Sandbox)

All skill execution happens inside a sandboxed container environment with hard kernel-level restrictions:

  • Read-only filesystem (except designated workspaces)
  • Network egress allowlist (only approved API endpoints)
  • Blocked dangerous commands (curl, wget, ssh, etc.)
  • Resource limits (CPU, memory, execution time)

This provides defense-in-depth. Even if the declarative layer is bypassed, the runtime sandbox provides a hard boundary.

State Management: Handling API Tokens and Sessions

Skills are stateless by design — they don’t persist information across sessions. But productivity integrations need state (API tokens, user preferences, session data).

PDA supports four state management patterns:

Pattern 1: Environment Variables

# In notion_upload.py
import os
api_key = os.getenv('NOTION_API_KEY')

Pattern 2: Filesystem Persistence

# Save state to ~/.config/notion/auth.json
with open(os.path.expanduser('~/.config/notion/auth.json'), 'w') as f:
json.dump({'token': token, 'expires': expires_at}, f)

Pattern 3: External MCP Servers

# Integrate with Model Context Protocol server for persistent storage
mcp-server: "notion-state-manager"

Pattern 4: User Confirmation

## Instructions
1. Ask user for API key if not in environment
2. Store in secure location for session duration
3. Clear on session end

The Notion skill uses Pattern 1 (environment variables) for security and Pattern 4 (user confirmation) for UX.

Real-World Impact: Case Study

Company: FinCompliance Corp (financial services compliance documentation) Challenge: Automate compliance document workflows across multiple systems

Before PDA:

  • Built custom integration scripts for Confluence/Jira
  • Each script required separate maintenance
  • No reusability across different compliance tasks
  • Context consumption: ~200K tokens for all integrations
  • Cost: $0.60 per workflow execution
  • Development time: 2–3 weeks per new integration

After PDA:

  • Created 5 reusable productivity skills (Confluence, Jira, SharePoint, Notion, GitHub)
  • Skills compose for complex workflows (“Extract Jira tickets to Generate report to Upload to SharePoint”)
  • Average context consumption: 15K-25K tokens per workflow
  • Cost: $0.045-$0.075 per workflow execution (87.5% reduction)
  • Development time: 2–3 days per new skill

ROI Metrics:

  • Token Efficiency: 87.5% reduction in context usage
  • Cost Savings: $0.525 per workflow × 5,000 workflows/month = $2,625/month saved
  • Capability Expansion: Enabled 8x more concurrent capabilities
  • Development Velocity: 5x faster time-to-market for new integrations

Section 7: Enterprise Implications and Future Directions

As of November 2025, Claude Skills are in feature preview across Pro, Max, Team, and Enterprise plans. They’re available on Claude’s web interface, Claude Code, and API. But what does production readiness actually look like?

You can also use skills with OpenCode which is an open source competitor to Claude Code that works with 75 different LLMs and allows you to log into your Pro or Max plan as well as allows you to login into your Github CoPilot account, plus many, many others. I imagine most Agentic frameworks will support skills at some level. I know that I have started trying to integrate them with non Claude agentic systems already and feel like I will soon have several agentic solutions that use Claude Skills outside of the Anthropic ecosystem besides OpenCode. PDA is a compelling idea.

Current State: Feature Preview

Availability:

  • ✅ Pro, Max, Team, Enterprise plans (not on free tier)
  • ✅ Web, desktop, and API access
  • ✅ Code execution must be enabled in settings
  • ⚠️ Custom skill creation more robust on Team/Enterprise

Ecosystem:

  • Growing community repository: github.com/anthropics/skills
  • Pre-built skills for common tasks (document processing, data analysis, integrations)
  • Active development of domain-specific skills (finance, healthcare, legal)

Known Limitations:

  • No centralized admin distribution for organizations (planned for future updates)
  • Custom skills require manual distribution or API deployment
  • Some advanced governance features still in development

Security Model: Defense in Depth

For enterprise deployment, security is paramount. PDA includes multiple security layers:

Layer 1: Declarative Intent Restrictions

The allowed-tools field in SKILL.md frontmatter:

name: "secure-document-processor"
description: "Processes sensitive financial documents with encryption"
allowed-tools:
- "FileSystem(read:/approved_docs)" # Only read from approved directory
- "Bash(gpg:*)" # Only GPG commands allowed
- "Python" # Python permitted (scripts are sandboxed)

This prevents prompt injection attacks from abusing the skill for unintended purposes.

Layer 2: Runtime Sandbox

All skill execution happens in isolated containers with:

  • Read-only filesystem (except /tmp and designated workspaces)
  • Network allowlist (only approved endpoints reachable)
  • Command blocklist (dangerous commands like rm -rfcurlwget blocked)
  • Resource limits (CPU, memory, execution time capped)

Layer 3: Audit Trail

Every skill invocation is logged:

  • Which skill was activated
  • What resources were accessed (Tier 3 files, scripts)
  • What tools were used (Bash commands, Python scripts)
  • What external APIs were called

This provides compliance-grade auditability for regulated industries.

Governance: CI/CD for Skills

The /v1/skills API endpoint enables programmatic skill management, supporting a full "CI/CD for Skills" workflow:

Development Workflow:

  1. Create Skill (Git repository)
my-custom-skill/
├── SKILL.md
├── scripts/
└── references/
  1. Version Control (Git tags)
# In SKILL.md frontmatter
version: "1.2.0"
git tag v1.2.0
git push --tags
  1. CI Pipeline (Automated validation)
# .github/workflows/validate-skill.yml
- name: Validate SKILL.md
run: skill-validator SKILL.md
- name: Audit allowed-tools
run: security-audit SKILL.md
- name: Run tests
run: pytest tests/
  1. CD Pipeline (Deploy to registry)
# On merge to main
curl -X POST <https://api.anthropic.com/v1/skills> \\
-H "Authorization: Bearer $API_KEY" \\
-F "skill=@my-custom-skill.tar.gz" \\
-F "version=1.2.0"
  1. Distribution (Make available to organization)
# Skill becomes available to all agents
client.skills.list() # Returns: [..., "my-custom-skill@1.2.0"]

This transforms skills from “loose files on developer machines” to governed enterprise assets with versioning, testing, and centralized management.

Real-World Use Cases with ROI Metrics

Case Study 1: Financial Services Compliance:

Company: GlobalBank Compliance Division

Challenge: Analyze 10-K filings for regulatory compliance across 500+ public companies

Skills Deployed:

  • pdf-extractor (extract text and tables)
  • sec-regulations (encode compliance rules)
  • report-generator (create audit reports)

Results:

  • Token Efficiency: 93% reduction in context usage vs. loading all SEC regulations upfront
  • Processing Speed: 50+ documents per hour (vs. 5–8 manually)
  • Cost Savings: $0.12 per document vs. $1.80 traditional approach
  • Annual ROI: $420,000 saved in API costs + $1.2M in analyst time
  • Compliance: 100% audit trail for regulatory requirements

Case Study 2: Healthcare Document Processing:

Company: MedTech Solutions (clinical documentation platform)

Challenge: Extract patient information from clinical notes while maintaining HIPAA compliance

Skills Deployed:

  • phi-detector (identify protected health information)
  • anonymizer (redact/encrypt PHI)
  • clinical-summarizer (generate de-identified summaries)

Results:

  • Security: Sandboxed execution ensures PHI never leaves approved environment
  • Processing Volume: 10,000+ clinical notes per day
  • Accuracy: 99.7% PHI detection rate (vs. 94% manual review)
  • Cost: $0.08 per document vs. $0.95 traditional approach
  • Annual ROI: $3.2M saved in processing costs
  • Compliance: Full HIPAA audit trail with zero violations

Case Study 3: DevOps Infrastructure Automation:

Company: CloudScale Inc. (SaaS infrastructure provider)

Challenge: Automate Terraform deployments with multi-stage approval and cost controls

Skills Deployed:

  • terraform-planner (generate and validate plans)
  • cost-estimator (predict infrastructure costs)
  • approval-workflow (human-in-the-loop gates)

Results:

  • Token Efficiency: 75% reduction in context usage (loads only relevant Terraform modules)
  • Deployment Speed: 15 minutes vs. 2 hours manual process
  • Cost Visibility: Prevented $180K in unplanned infrastructure spend
  • Error Reduction: 95% fewer deployment failures
  • Annual ROI: $540K saved in infrastructure costs + $320K in engineer time

Case Study 4: Legal Contract Review:

Company: Morrison and Associates (corporate law firm)

Challenge: Review contracts for compliance with firm-specific guidelines and client requirements

Skills Deployed:

  • contract-analyzer (extract clauses and obligations)
  • firm-policies (encode law firm's review standards)
  • risk-scorer (assess contract risk)

Results:

  • Review Speed: 45 minutes vs. 4–6 hours manual review
  • Consistency: 100% adherence to firm standards (vs. 85% manual)
  • Cost: $12 per contract vs. $450 attorney time
  • Governance: Skills versioned in Git, reviewed by partners before deployment
  • Annual ROI: $1.8M saved in junior attorney time
  • Quality: Full audit trail for client billing and malpractice defense

💡 Key Takeaway: Across industries, PDA delivers 75–95% cost reductions, 5–10x speed improvements, and enterprise-grade compliance — making AI agents viable for production at scale.

Limitations and Mitigation Strategies

PDA is powerful but not without constraints:

Limitation Description Mitigation Strategy Impact Statelessness Skills don’t persist data across sessions Use external MCP servers, filesystem persistence, or environment variables for state Low (workarounds available) Opaque Triggering LLM’s skill selection logic isn’t directly inspectable Refine skill descriptions for clarity; test with diverse prompts; use explicit skill invocation in critical workflows Medium (requires testing) Ecosystem Maturity Not all integrations available as pre-built skills yet Build custom skills (relatively easy with SKILL.md format); contribute to community repo Low (easy to extend) Admin Distribution No centralized org-wide skill deployment (yet) Use/v1/skillsAPI for programmatic distribution; planned for future updates Medium (manual workaround)

Limitations and Mitigation Strategies (Bullet Point Summary)

Statelessness

  • Description: Skills don’t persist data across sessions
  • Mitigation Strategy: Use external MCP servers, filesystem persistence, or environment variables for state
  • Impact: Low (workarounds available)

Opaque Triggering

  • Description: LLM’s skill selection logic isn’t directly inspectable
  • Mitigation Strategy: Refine skill descriptions for clarity; test with diverse prompts; use explicit skill invocation in critical workflows
  • Impact: Medium (requires testing)

Ecosystem Maturity

  • Description: Not all integrations available as pre-built skills yet
  • Mitigation Strategy: Build custom skills (relatively easy with SKILL.md format); contribute to community repo
  • Impact: Low (easy to extend)

Admin Distribution

  • Description: No centralized org-wide skill deployment (yet)
  • Mitigation Strategy: Use /v1/skills API for programmatic distribution; planned for future updates
  • Impact: Medium (manual workaround)

Section 8: Conclusion — A New Paradigm for AI Knowledge

Progressive Disclosure Architecture isn’t just an optimization technique — it’s a fundamental reimagining of how AI agents access and use knowledge.

The Key Insights

1. Context Windows Are a Precious Resource

The naive “documentation dump” approach fails at scale. Loading 120K-300K tokens upfront creates:

  • Catastrophic performance degradation (15–25 second latency)
  • Unsustainable costs ($0.50-$1.50 per request)
  • Inability to add new capabilities (hitting limits with 3–5 tools)

PDA’s 90% token reduction isn’t incremental — it’s transformative. It’s the difference between “this doesn’t work” and “this scales to production.”

2. Just-in-Time Loading Changes Everything

By loading knowledge on-demand through three tiers:

  • Metadata (100 tokens per skill)
  • Instructions (1–5K tokens when selected)
  • Resources (selective, filesystem-based)

…we decouple “how much knowledge is available” from “how big is the context window.”

This is the architectural breakthrough that enables:

  • 20–30 capabilities in the same context budget as 3–5 traditional tools
  • $0.15-$0.30 per request vs. $2.50 traditional approach
  • 2–4 second latency vs. 15–25 seconds

3. Declarative Beats Imperative for Complex Workflows

For multi-step, ambiguous tasks requiring reasoning:

  • Procedural manuals (Skills) beat function schemas (Tools)
  • Flexibility beats rigidity
  • Human-readable beats machine-only

OpenAI Tools still excel for discrete API calls. Skills excel for orchestration. The future is hybrid: Skills orchestrate Tools for maximum power.

4. Security and Governance Are Built-In

The two-layer security model (declarative + imperative) and human-readable SKILL.md format make Skills:

  • Auditable: Compliance teams can read the instructions
  • Governable: Version control, CI/CD, centralized registry
  • Secure: Sandboxed execution, least-privilege principle

This enables enterprise deployment at scale with full compliance.

The Demonstrated Value

Our two deep-dive examples proved PDA’s power:

Diagramming Skills showed:

  • 98.7% token reduction (157K to 2.1K tokens)
  • “Unbounded context” principle (80K+ token reference docs loaded selectively)
  • 12x more diagram types in the same context budget
  • Real ROI: $5,160/month saved for TechDocs Inc.

Productivity Skills showed:

  • 93% token reduction for complex workflows (150K to 10K tokens)
  • Enterprise-grade security (OAuth, API key management, sandboxing)
  • Multi-step orchestration (Confluence download to transformation to Notion upload)
  • Real ROI: $31,500/year saved for FinCompliance Corp

These aren’t toy examples. These are production-ready integrations handling real enterprise complexity with measurable ROI.

The Philosophical Shift

Progressive Disclosure represents a move from tool-augmented LLMs to true agentic systems:

  • Old paradigm: Give the AI a list of functions it can call
  • New paradigm: Give the AI a library of procedures it can learn

This shift mirrors how we onboard humans:

  • We don’t wire them with function pointers
  • We give them manuals, runbooks, and training guides
  • They learn, reason, and adapt

Skills treat AI agents the same way — as intelligent entities capable of learning from documentation, not just executing predefined functions.

What’s Next

As of November 2025, Claude Skills are in feature preview with active development:

Near-term (2025–2026):

  • Centralized admin distribution for organizations
  • Expanded pre-built skill library (50+ community skills)
  • Enhanced MCP integration for hybrid architectures
  • Improved debugging tools for skill development
  • Performance optimizations (faster Tier 2 loading)

Long-term Vision:

  • Skills as standard for organizational knowledge encoding
  • “Skill marketplaces” for buying/selling specialized capabilities
  • Cross-platform skill portability (if standards emerge)
  • AI agents that compose skills dynamically for novel tasks
  • Integration with enterprise knowledge graphs

Call to Action

If you’re building AI agents for production:

1. Explore the Ecosystem

  • Check out github.com/anthropics/skills for pre-built skills
  • Join the community Discord for best practices and support
  • Review case studies from early adopters

2. Start Simple

  • Create a basic skill for one workflow at your company
  • Use the SKILL.md template from the community repo
  • Test with diverse prompts to refine skill descriptions

3. Measure Impact

  • Compare token usage and performance vs. traditional approaches
  • Track cost savings, latency improvements, and capability expansion
  • Document ROI for stakeholder buy-in

4. Scale Up

  • Build a library of skills encoding your organizational procedures
  • Implement CI/CD pipelines for skill governance
  • Train teams on skill authoring and maintenance

5. Share Back

  • Contribute successful skills to the community repo
  • Document lessons learned and best practices
  • Help shape the future of the ecosystem

Progressive Disclosure isn’t just a better way to manage context — it’s a new paradigm for building AI agents that are simultaneously deeply capable and highly efficient.

The future of AI agents isn’t bigger context windows. It’s smarter knowledge loading.

Further Reading:

Discussion Questions:

  • How could Progressive Disclosure Architecture transform your organization’s AI strategy?
  • What workflows in your company would benefit most from Skills-based automation?
  • How do you balance flexibility (Skills) vs. type safety (Tools) in your AI systems?

Did Progressive Disclosure Architecture change how you think about AI agent design? Share your experiences with Claude Skills in the comments, or connect with me on LinkedIn to discuss enterprise AI implementation strategies.

Tags: #AI #MachineLearning #Claude #Anthropic #EnterpriseAI #ProgressiveDisclosure #AIAgents #DevOps #Automation #TechArchitecture

🔗 Connect & Share:

Follow for more deep dives on AI-powered development, DevOps automation, and modern software engineering practices.

#AI #DeveloperTools #ClaudeCode #SoftwareDevelopment #AgenticAI #FutureOfCoding #Engineering #DevOps #AIAssistants #PairProgramming #ProductivityHacks #TechLeadership


About the Author

Rick Hightower is a technology executive and data engineer with extensive experience at a Fortune 100 financial services organization, where he led the development of advanced Machine Learning and AI solutions to optimize customer experience metrics. His expertise spans both theoretical AI frameworks and practical enterprise implementation.

Rick wrote the skilz universal agent skill installer that works with Gemini, Claude Code, Codex, OpenCode, Github Copilot CLI, Cursor, Aidr, Qwen Code, Kimi Code and about 14 other coding agents as well as the co-founder of the world’s largest agentic skill marketplace.

Connect with Rick Hightower on LinkedIn or Medium for insights on enterprise AI implementation and strategy.

Community Extensions & Resources

The Claude Code community has developed powerful extensions that enhance its capabilities. Here are some valuable resources from Spillwave Solutions (Spillwave Solutions Home Page):

Integration Skills

  • Notion Uploader/Downloader: Seamlessly upload and download Markdown content and images to Notion for documentation workflows
  • Confluence Skill: Upload and download Markdown content and images to Confluence for enterprise documentation
  • JIRA Integration: Create and read JIRA tickets, including handling special required fields

Recently, I wrote a desktop app called Skill Viewer to evaluate Agents skills for safety, usefulness, links and PDA.

Press enter or click to view image in full size
Press enter or click to view image in full size
Press enter or click to view image in full size

Advanced Development Agents

  • Architect Agent Agent Skill: Puts Claude Code into Architect Mode to manage multiple projects and delegate to other Claude Code instances running as specialized code agents
  • Project Memory Agent Skill: Store key decisions, recurring bugs, tickets, and critical facts to maintain vital context throughout software development

Visualization & Design Tools

  • Design Doc Mermaid Agent Skill: Specialized skill for creating professional Mermaid diagrams for architecture documentation
  • PlantUML Agent Skill: Generate PlantUML diagrams from source code, extract diagrams from Markdown, and create image-linked documentation
  • Image Agent Generation: Uses Gemini Banana to generate images for documentation and design work
  • SDD Agent Skill: A comprehensive Claude Code skill for guiding users through GitHub’s Spec-Kit and the Spec-Driven Development methodology.
  • PR Reviewer Agent Skill: Comprehensive GitHub PR code review skill for Claude Code. Automates data collection via gh CLI, analyzes against industry-standard criteria (security, testing, maintainability), generates structured review files, and posts feedback with approval workflow. Includes inline comments, ticket tracking, and professional review templates.

AI Model Integration

  • Gemini Agent Skill: Delegate specific tasks to Google’s Gemini AI for multi-model collaboration
  • Image_Agent gen: Image generation skill that uses Gemini Banana to generate images.

Explore more at Spillwave Solutions — specialists in bespoke software development and AI-powered automation.

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training