Rick

Rick
Rick

Sunday, January 18, 2026

Give your Claude Code, OpenCode and Codex full RAG over docs and code repos

 


Empowering AI Coding Agents with Private Knowledge: The Revolutionary Doc-Serve Agent Skill

In the age of AI agents, one persistent challenge stands out: hallucinations and code agents forgetting. Even the most advanced tools like Claude Code can confidently generate plausible but incorrect answers when querying complex, domain-specific information, especially when it misses out on private documentation and proprietary codebases. Public knowledge bases help, but they fall short for internal projects, enterprise systems, or specialized software where the real “source of truth” lives behind closed doors.

That’s where Doc-Serve changes the game. This an agent skill that provides private Retrieval-Augmented Generation (RAG) system combines intelligent document indexing, semantic search, and here is the clincher deep code understanding to deliver accurate, grounded responses. At its heart is the Doc-Serve Agent Skill, a native agent skill that works with Claude Code, Open AI, Gemini CLI, OpenCode and more. This is an integration that turns AI assistants into powerful domain experts by giving them seamless access to your private knowledge base.

Press enter or click to view image in full size
Private Context Rich RAG Agent Skill to search code bases and internal docs

I wanted a way to pull down code and documents and make them searchable from my code agent and that is the birth of the doc-serve agent skill. I wrote agent skills and tools to recursively pulldown Notion pages agent skill, JIRA tickets agent skill and Confluence agent skill documents. Combine this with code bases from GitHub. Sprinkle in Word Docs, PDF, PowerPoint slides then index them all with your own personal context semantic RAG with LlamaIndex and your coding assistants have access to your entire corpus of private knowledge. This also work with the SDD agent skill (spec driven development) because now my specs are searchable and architect agent because now saved plans, designs and instructions are searchable. This also works with the project memory agent skill. This could help give your coding agents more grounded truth to imporve design, and coding efforts.

Press enter or click to view image in full size
Ingesting Code and Documents so it is searchable via RAG (Vector similarity search and BM25)

What Makes Doc-Serve Different?

Most RAG systems treat documents as plain text. Doc-Serve goes deeper, especially with source code:

  • Code-Aware Ingestion: Supports 10 major programming languages (Python, TypeScript, JavaScript, Java, Kotlin, C, C++, Go, Rust, Swift) using Tree-sitter for AST-aware chunking. This means intelligent splitting along functions, classes, and logical boundaries, not arbitrary line counts.
  • LLM-Enhanced Summaries: Every code chunk gets an AI-generated summary powered by Claude Haiku, dramatically improving semantic search relevance. We use the headers and sections of the document as semantic context.
  • Unified Search Across Docs and Code: Query everything at once, or filter precisely by language, source type (documentation versus code), or both.
  • Hybrid Search Power: Combines vector embeddings (OpenAI’s text-embedding-3-large) for semantic understanding with BM25 for exact keyword matching.
  • Context-Aware Chunking: Overlapping chunks preserve meaning across boundaries, avoiding fragmented results.

The result? Natural language queries like “How is authentication implemented in the backend?” return not just relevant docs, but actual code snippets with summaries, cross-references, and precise context.

Press enter or click to view image in full size
Ingesting different types of code bases, and providing context away chunking to improve RAG to give improved grounding to your coding agents

The Doc-Serve Agent Skill: Claude Code Private Knowledge Superpower

The standout feature is the doc-serve-skill package, a dedicated Claude Code Agent Skill that integrates directly into AI workflows.

Agent skills is now a standard that works with GitHub Copilot, OpenCode, Gemini, Codex, Forge, Cursor, etc.

Defined in Doc Serve Agent Skill’s SKILL.md, is exposure to simple yet powerful commands:

  • query: Search documentation and code with natural language, optionally filtering by language or source type.
  • index: Add or update documents and code to keep the knowledge base current.
  • status: Monitor indexing health and progress.

Once configured, Claude Code (or Codex or OpenCode) can autonomously query your private RAG system. No more guessing and no more outdated public info. Whether you’re debugging a legacy codebase, onboarding new developers, or building AI agents that need reliable technical knowledge, the Doc-Serve Agent Skill delivers grounded, accurate responses every time.

Real-World Impact

Imagine this scenario:

You’re working on a large monolith with scattered documentation and thousands of lines of Python and TypeScript code. You ask Claude:

Show me how API endpoints are protected in the user service.

With the Doc-Serve Agent Skill enabled:

1. Claude uses the query command filtered to Python code.

2. It retrieves relevant functions with their AI-generated summaries.

3. It cross-references related documentation.

4. You get exact snippets, file paths, and explanations, all from your actual codebase.

The goal: No hallucinations. No forgetting. No digging through files. Just reliable, context-rich answers. What I noticed, is the most you can do is reduce.

Easy to Get Started

Doc-Serve is developer-friendly and open source (MIT licensed):

git clone https://github.com/SpillwaveSolutions/doc-serve-skill.git

cd doc-serve-skill

task install

cp doc-serve-server/.env.example doc-serve-server/.env

Add your OpenAI and Anthropic keys:

task dev

I believe that the skill will install itself so technically you just install the skill and tell it to set itself up.

Then index your project:

doc-svr-ctl index ./my-project  --include-code

And query:

doc-svr-ctl query \
“authentication flow” \
languages python typescript

Again, the skill knows how to do all this so you just tell it which directories to index and what you want to search and it runs the right commands.

The Doc-Serve Agent Skill integrates seamlessly into your AI coding assistant workflows.

The Future of Grounded AI

The Doc-Serve Agent Skill isn’t just another RAG tool. It is a blueprint for the next generation of AI agents that operate with enterprise-grade reliability. By combining private knowledge, code intelligence, and native Code Agent integration, it eliminates one of the biggest barriers to real-world AI adoption: trust. You can ground your private corporate knowledge.

Whether you’re a solo developer maintaining a complex project or a team building proprietary systems, Doc-Serve empowers your AI assistants to truly understand your domain.

Check out the repository at https://github.com/SpillwaveSolutions/doc-serve-skill, star it, try the skill, and join the movement toward more reliable, grounded AI. The era of hallucination-free technical assistance is here.

Install guide

Press enter or click to view image in full size

Step 1 — Install skilz

Skilz is the universal package manager for AI skills.

pip install skilz

Verify installation:

skilz --version

Step 2 — Install doc-serve-skill

Option A — Global / user install

Installs into your default agent skills directory (e.g. ~/.claude/skills for Claude):

skilz install https://github.com/SpillwaveSolutions/doc-serve-skill

Option B — Project-level install

Use this if you want the skill tied to a specific project directory:

skilz install https://github.com/SpillwaveSolutions/doc-serve-skill --project

This installs locally into:

./.claude/skills/

(or equivalent for your agent)

Step 3 — Target a specific AI agent (optional)

You can install Doc-Serve specifically for different coding assistants using --agent. You can use this flag for user install or for a project install --project .

Example: Install for Codex (Cursor, etc.)

skilz install https://github.com/SpillwaveSolutions/doc-serve-skill \
--agent codex

Other supported agents

You can replace codex with any of:

--agent gemini
--agent copilot
--agent opencode
--agent claude

(…and more — 14+ agents total.)

For project level this will install the skill at

.codex/skills
.gemini/skills
.github/copilot/skills
.opencode/skill
.claude/skills (the default if you leave off agent)

For user level (no --project)

~/.codex/skills
~/.gemini/skills
~/.config/opencode/skill
~/.claude/skills (the default if you leave off agent)

Step 4 — Using Doc-Serve

Once installed:

Enter Planning Mode (or your agent’s equivalent). Describe code bases and documents you want to include.

Ask your coding assistant to:

  • “Use the doc-serve skill to make the following locations searchable… ”

Then let it loose.

Because Doc-Serve has indexed your:

  • specs
  • designs
  • code
  • internal docs

…your agent will now ground its answers in your real, private context rather than generic knowledge.

Or if you have notion, confluence or JIRA skills, you could say

“save the context of the following epics in JIRA … and look up related documents in Confluence and Style guides, as well as these Github repos and then search for related Github Repos and pull those down. Store confluence pages and subpages under ./confluence, store epics and related tickets under ./tickets, and put repos under ./related-repos in the architect agent dir.”

I have done something like this recently and it worked very well. It is amazing when you come up with a plan and it works.

Next Steps and random thoughts

I wrote this doc serve skill a while back, and I’ve just started using it more recently. I thought about building this for a long time before I actually built it. It has been percolating for a while. I built something similar for a client in 2023 but the project ended before it went live. The tools are much better now. LlamaIndex is amazing.

One thing I definitely want to add is Ollama-based text embeddings so that I can keep everything on my local environment for search. I also want to use an Ollama-based LLM for summarization and possibly for search-related tasks as well. It was in the oringinal spec and plan, along with a lot of other things but I wanted something minimal that works that I can evolve. The quickest way to project success, in my opinion, is to get the smallest possible version working, then add on top of this. Even AI based projects die on the vine for being too much of a moonshot. Keep it simple and it will already be much harder than you imagined.

Right now, I know I am using Claude Haiku for summarization for context aware chunking, along with OpenAI latest embeddings for vector indexing and lookup. I want to make all of that configurable so users can swap components in and out more easily. Ollama based embeddings and LLM for example.

A big goal for me is to make this fully private and local, so that nothing ever has to leave your machine. Fortunately, I have a pretty beefy machine, so performance shouldn’t be a major issue for me personally. But, not everyone is blessed with a laptop that will burn you lap.

Another thing I want to improve is the installation experience and overall rigor around setup. I’d like to support multiple installs more cleanly, make it easier to run a lightweight BM25 index alongside a lightweight vector search, and generally improve port and resource management. Ideally, the system should track which ports are being used and make installations more seamless per project. It is not there yet.

Right now, it works, but it still feels like a working proof of concept rather than a polished product. That said, it actually works pretty well. I’ve tried it, and I’ve used it successfully, just not as much as I would like because I got pulled into other projects. A combination of I need to eat and shiny new object and projects that keep getting in the way.

Longer term, I’d like to turn this into a full-blown plug-in with proper commands and agents. But that’s probably lower priority for now. One thing I like about keeping it as a “agent skill” is that it remains compatible with many different AI coding agents. Once it becomes a formal plug-in, compatibility might get trickier across tools. So far, I’ve mostly tested it with Claude Code anyway when I am using my architect agent, which is an agent skill to manage other coding agents. I use Claude Code and OpenCode fairly equally with a mix of Codex and Gemini.

Separately, I have other projects that use Postgres with BM25 support and vector embeddings via pgvector. I want to create a version of this doc-serve system that works with Postgres as well. I really love this combination. My current thinking is that it would create its own Postgres database, and potentially even create a separate database per project so that everything stays isolated. But maybe it is better to spin up docker containers. I don’t know. Not everyone has 98 GB of RAM. Also if someone would like to donate $10,000, I would love a new Mac Studio with 512GB of RAM. I want one so bad.

I definitely want to keep indexes separate rather than lumping everything together in one giant store. I haven’t fully worked out the best design yet, so I need to think about that more.

Originally, my plan was just to use DuckDB with its vector embedding support. But in practice, I often use this system across multiple AI coding agents at once, and I wasn’t confident DuckDB would scale well in that multi-agent, networked scenario. But DuckDB could be an option or even one of the new lightweight SQLight variants/forks with networking and replication built-in but not a high priority.

That’s why I really wanted more of a networked service layer so that multiple agents could query it simultaneously. I regularly switch between OpenCode, Claude Code, Gemini CLI, and Codex, sometimes even on the same project depending on what I’m doing or which token limits I’ve hit. I do need to hit the same corpus for RAG.

So that’s where my thinking is right now. It’s a fun project, and I’m excited to keep improving it. It is useful today but could use some love.

Shout out to LlamaIndex, which makes so much of this so doable and easy.

Try it out. Throw me some stars and forks.

Press enter or click to view image in full size
Architecture of current doc-serve system

About the Author

Rick Hightower is a technology executive and data engineer who led ML/AI development at a Fortune 100 financial services company. He created skilz, the universal agent skill installer, supporting 14+ coding agents including Claude Code, Gemini, Copilot, and Cursor, and co-founded the world’s largest agentic skill marketplace. CConnect with Rick Hightower on LinkedIn or Medium.

The Claude Code community has developed powerful extensions that enhance its capabilities. Here are some valuable resources from Spillwave Solutions (Spillwave Solutions Home Page):

Integration Skills

No comments:

Post a Comment

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training