Claude Code, CLAUDE.md, and Token Optimization: Practical Tips for RAG Development Without Burning Your Credits

I really struggled with this post, probably because I wanted to say too much. It is not easy to combine the practical application of AI with a more abstract reflection on the AI phenomenon itself, particularly its multiple consequences from an economic, ideological, and even philosophical point of view. Humbly, I cannot help but analyze both the ideology that praises AI and the one that condemns it.

For this post also, you can find all files, on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/ia_training_rag_custom

This is probably the reason for my slow pace of publication. I am in a constant state of contradiction. Like anyone else, artificial intelligence evokes conflicting feelings in me: it worries me and excites me in equal measure.

On the one hand, AI allows me to delve deeper into practical applications; on the other, it raises numerous questions. These questions fuel a distinct feeling that I am enthusiastically working toward my own downfall for the greater benefit of AI platforms — Claude, Perplexity, Mistral, Gemini, to name the principal ones. No doubt others share this observation. If so, it offers me no consolation whatsoever. Okay, let us stop the lamentations and get down to business.

Ironically, RAG is the epitome of laziness transformed into a digital product. Even with generative AI, we no longer truly see ourselves writing — but with RAG, we combine the two: it is neither reading nor writing, a genuine tool for intellectual passivity, in a way.

That said, RAG represents a promising digital product to develop, given its diverse application areas. It also serves as an excellent proof of concept (POC) to assess Claude Code’s ability to produce this kind of application under a dual constraint: maximizing output quality (creating a RAG that is as professional as possible) while minimizing costs (tightly controlling token consumption, the new scarce resource).

Using Claude Code to develop a RAG

More seriously, instead of flying blind, I decided to fully leverage Claude Code in order to tackle concrete use cases, including the creation of a RAG (Retrieval-Augmented Generation).

Claude Code now gives me access to an unparalleled level of development capability and explanation.

As a Product Owner, Claude Code can support me at every stage of product creation. It proves to be a valuable sparring partner, helping me logically structure seemingly irrational explorations. Guided by intuition, AI enables us to open and close avenues of exploration far more quickly than before.

The only real constraint is to be as precise as possible in prompts — and ideally, to continuously improve those prompts in an iterative way, ultimately producing a well-crafted CLAUDE.md file.

Whether I need to speak as a Product Owner, a Developer, a Software Architect, a Journalist, a Trainer, or a Consultant, precision is essential. This sometimes means using technical vocabulary, but above all it means prioritizing the narrative of the user need that identifies the value of what you want to deliver.

I can therefore focus primarily on value — the promise of the product I am about to create, in this case a RAG.

This significantly enhances a value-driven approach, a principle that agility demands: a project should be guided solely by the measurable value it delivers. Yet in reality, this is not always the case. With the emergence of AI coding assistants, development itself becomes almost anecdotal — a mere execution step — compared to defining requirements and, therefore, creating value.

The role of AI in exploration and innovation

To be perfectly honest, without AI it would have been impossible for me to explore RAG development in such depth.

Claude Code’s role begins right at the research and benchmarking stage, which I previously performed manually through Git searches. Now I can select a project that closely matches my needs, analyze it independently, and then ask Claude Code to validate or challenge my understanding. This analysis typically reveals best practices, which I can then adopt or reject by noting them in the CLAUDE.md file.

Through an iterative process, I first focus on my own understanding to generate User Stories, then on a set of rules and principles that will structure the development, and finally on the end-to-end use of the product I am building.

Unsurprisingly, AI is taking up more and more of my time, as it allows me to venture into areas I would probably never have explored without its help.

Returning to the RAG project, I attempted to apply the principles outlined above:

  • I first tried to provide the simplest possible definition. Communication is key. “Le faire-savoir est plus important que le savoir-faire.” (“Making it known is more important than the know-how.”)
  • I developed User Stories — after all, I am more Product Owner than anything else.
  • I then audited applications on GitHub, both RAG and other types, primarily in Python, to understand best practices and optimal architecture. It is like reading an open book.

A first pitfall

With Claude Code’s help, the inherent risk in this exercise is being too greedy, too ambitious, too presumptuous — in short, too human and too ignorant.

I quickly made many mistakes: too ambitious in terms of scope, too many seemingly similar but ultimately disparate use cases.

Realizing your mistakes comes at a cost: burning through your Claude Code token credit. By being overly ambitious (a multi-purpose RAG while remaining market-agnostic), too specific in its business rules (AI-Powered Tender Document Evaluation and Report Generation), and making the application unmaintainable, Claude Code does not exactly encourage humility and frugality — because overuse is, after all, in its own interest.

Claude Code burns tokens voraciously if left unconstrained, and a smart developer must treat tokens as a scarce resource. Tokens are precious. “We wants it, we needs it. Must have the precious.” Gollum would say 🙂

Creating a RAG application or any other app without sufficient prior knowledge risks being very expensive in tokens. You are quickly caught in the following dilemma: the more you explore, the more you consume — and the more you consume, the faster you hit the wall. “Do not bite off more than you can chew” is more relevant than ever. That said, it is possible to break down apparent complexity and make it more manageable, little by little, with AI — but it comes at a price.

A beginner’s mistake: the multi-purpose RAG project

The fundamental problem is that AI does not make you question the objective being pursued. It reinforces your convictions even when you are persisting in error, since it never contradicts you unless you explicitly ask it to. This is actually good practice in prompting: once you have substantiated a proposal, ask Claude Code to demolish it.

In the specific case of the multi-purpose RAG, the approach proved too complex, making the pipeline difficult to maintain. Although building a multi-purpose RAG relies on a simple abstraction, summarized by this user story:

“As a user, I want to explore a large collection of documents in any format by making queries in natural language.”

The goal was to make the RAG market- and business-agnostic. The promise was simple: the domain of activity changes, but the pipeline remains the same.

Remember: do not bite off more than you can chew.

My relative inexperience with RAGs and my systematic reliance on Claude Code taught me one thing: I needed to quickly find a way to reduce or bypass the token limit.

This involved having other AIs (Mistral, Perplexity) write the CLAUDE.md file and reserving Claude Code exclusively for its ability to think and code, while restricting its usage as much as possible.

I experimented with both approaches, and here are some tips for managing tokens effectively. I eventually reached the limit, but much later than expected.

From what I understand, the idea is to set up a kind of tutor for Claude Code that prevents it from systematically re-reading all your application files. This tutor maintains a changelog that records all changes, significantly reducing token consumption during read operations and keeping the focus on thinking and coding. You know the moment Claude Code starts dilly-dallying — scanning entire directories to find a single function, re-reading files it has already seen.

I feel like a foreman on a construction site: “No more dilly-dallying, just thinking and working. Who runs this joint?”

Let us focus on this topic, which is far more relevant than the manufacture of a RAG or any other application.

Claude Code Token Optimization Problem (CCTOP)

A token management and optimization technique for Claude Code

Facing the token limit with Claude Code, I needed to minimize consumption using any relevant strategy available.

To address this, I implemented an additional layer on top of Claude Code called OpenWolf. This solution prevents Claude Code from excessively reading files, limiting token consumption and optimizing its usage.

OpenWolf

Installing OpenWolf:

node -v
npm install -g openwolf
openwolf --version

Using OpenWolf:

cd your-project
cd /path/001_ai_powered_rag_single
# init
openwolf init
# then launch claude

# in another console, check the dashboard
openwolf dashboard

The explanation from OpenWolf is compelling:

We were building products with Claude Code at Cytostack when we noticed something off. Sessions were eating through tokens faster than they should. When we dug in, we found Claude re-reading the same files multiple times, scanning entire directories to find one function, and having no way to know what a file contained without opening it. There was no project map, no read awareness, no token visibility. So we built the tooling we wished existed — a file index so Claude reads less, a learning memory so it gets smarter, and a ledger that tracks every token. That became OpenWolf.

Other tools exist to help tailor Claude Code to your budget. Here are a few examples.

I have not used any of them yet except OpenWolf, due to time constraints and a preference for simplicity. A similar tool worth noting is cc-lens.

What does Claude Code itself say about its own Token Optimization Problem (CCTOP)?

I asked Claude Code to investigate the matter and propose a workaround.

The conclusion is clear: aside from prioritizing a well-written CLAUDE.md file, there is no real plug-and-play solution. It comes down to good development practices.

The CLAUDE.md file can be created by another AI to preserve your precious Claude Code tokens — Mistral, Gemini, and Perplexity all do the job well.

What Claude Code proposes is the “context handoff” or “session bridging” technique. This does not bypass the token limit in a technical sense, but it allows you to continue working efficiently despite it.

The principle:
Before reaching the limit, create a file (README, CONTEXT.md, HANDOFF.md, etc.) that summarizes the current state of work, then open a new session importing this file as the starting context.

How to do it properly with Claude Code:

# In the current session, ask Claude to generate the handoff:
"Create a HANDOFF.md summarizing: current task, files modified, decisions made, next steps, and any blockers."

# Then in the new session:
# Launch Claude Code
claude

# Then in the prompt:
"Read HANDOFF.md and continue where we left off."

What this does not solve:

  • The 5-hour limit resets independently — a new session draws from the same remaining quota.
  • The new session starts without the implicit context of the conversation: the exchanges, the errors encountered, the tone of decisions — only what you have documented is retrieved.
  • For complex projects, a HANDOFF.md that is too brief will miss important nuances.

In a sense, this is also the underlying logic behind OpenWolf and the other tools listed above — which is why I decided to adopt one of them.

Leveraging RAG to extract semantic search as a real digital asset

For noble reasons (ecological awareness, frugality, utilitarianism) or less noble ones (mostly frugality and pragmatism), I sought to extract value from my RAG investigation by leveraging the retrieval step (R) rather than the generation step (G). In short, I interrupt the RAG process at step 5 and export everything to a JSON file. This gives me access to a semantic search capability that I can then implement within my WordPress blog via a custom plugin — so that nothing is wasted.

After this export, thanks to a custom plugin written by Claude Code, I activate semantic search within my blog without relying on an external API at runtime during development.

Process followed:

# 5 steps to leverage the RAG for semantic search only
Step 1 — Ingestion
Step 2 — Chunking
Step 3 — Embeddings
Step 4 — FAISS Index
Step 5 — Retrieval

The goal is to query a corpus of articles (a selection from flaven.fr) within a blog and derive a semantic search functionality from that experience, to be made available directly on my website.

General concepts for understanding AI as a phenomenon

If you have made it this far, that is already a remarkable achievement. In the following section, I deliberately move away from practical application to address some of my broader questions about AI.

I humbly question the symbolic consequences of AI’s emergence, much like a sociologist or philosopher might, with two nagging questions:

  1. With the transformation or disappearance of work, AI is already appropriating the cultural and social capital of entire professions — journalists, lawyers, doctors, developers, managers, communications professionals. The question then becomes: how does one survive this dispossession and the potential disappearance that follows?
  2. What ideology does AI espouse? Particularly one of domination, to use a fashionable term.

It may be ambitious and provocative to reflect on such questions, but it is the only way to take a meaningful step back.

Several observations are necessary:

  1. Acceleration as an obstacle to reflection: It is difficult to form a solid opinion on AI given the rapid pace of developments and the countless questions they raise. One question replaces another before the previous one has been resolved. This information overload is not insignificant — it can become a weapon of disinformation, depriving individuals of the time needed for analysis and making them more vulnerable to manipulation.
  2. Our emotional biases, a mirror of the biases in the models: Our perception of AI is often subjective and fluctuating. Depending on whether we are optimistic, pessimistic, tired, or euphoric, the conclusions we draw will be radically different. While AI models are rife with cognitive biases, we are not immune — our emotional biases add another layer of distortion to our understanding of the issues.
  3. Escaping the false alternative: Where does this recurring debate between two extremes actually lead us — on one side, unconditional adoration of technology, and on the other, total rejection and a return to the Stone Age? How can we move beyond this sterile opposition and adopt a more nuanced perspective?

With these warnings in mind, here are some positive notes from the experiment:

Claude Code: a revolutionary tool to be used strategically
It sounds obvious, but it is undeniable: Claude Code stands out as the ideal tool for designing, exploring, and executing complex technical projects. However, I have chosen to spare this powerful model from certain tasks by delegating them to other AIs (Perplexity, Mistral):

— Specific technical questions (e.g., “How can spaCy optimize chunking in a RAG?”).
— Documentation writing (CLAUDE.md, README files, SKILLS or ROLE sections).

The ultimate goal is to develop a methodology that preserves Claude Code’s resources for what truly matters: deep thinking and coding.

Writing is the new skill
Ironically, the more powerful AI becomes, the more crucial human writing becomes right now.

A rich vocabulary and organized thinking exponentially increase the effectiveness of Claude Code.

— Even a vague idea is enough: a simple user story can serve as a starting point.
— The “Garbage In, Garbage Out” rule has never been more true: mediocre inputs produce mediocre outputs.

It would be presumptuous to claim that AI does not think for me — that it merely amplifies my thinking. In reality, it does think for me, but I am the one guiding this deconstruction-reconstruction learning process.

To summarize, here is what the RAG construction process has taught me.

Cognitive tinkering with AI

The “Disassembly > Partial Understanding > Reassembly” loop can be broken down as follows. We should keep the complexity in perspective: producing a RAG is a known and well-documented problem.

[Complex Problem]

        ↓
[AI as a Deconstruction Tool] → (e.g., "Explain RAG to me")

        ↓
[Knowledge Building Blocks] → (code, concepts, examples)

        ↓
[User Reconstruction] → (experimentation, testing, adjustments)

        ↓
[Functional Solution] → (even if incompletely understood)

        ↓
[Learning Loop] → (repetition to refine understanding)

The automation of rationality
AI is not simply assisting humans — it is gradually replacing tasks that rely on logic, analysis, and methodical execution. These skills are at the heart of many professions, including mine as a Product Owner and that of a developer.

— A Product Owner uses tools like Jira or Trello to prioritize tasks. AI can already do this better, by analyzing user data, feedback, and business metrics.
— A developer writes code. Claude Code can generate 80% of that code. In my case, since I do not consider myself a developer, it is 100% of the RAG code.

The perceived value of these professions is declining because what was once a skill is becoming a commodity accessible to everyone. This calls into question the very value of human labor, since many professions rely on rational and repetitive skills — the product of experience and learning.

On a slightly positive note: for now, AI is a powerful tool for the known, but humans remain essential for the unknown. With their emotions, intuitions, and varying capacity to manage uncertainty, humans still hold sway in a few domains — but for how much longer?

The world according to Karp

This is precisely how I understand the statement by Alex Karp, CEO of Palantir, who is never one to shy away from a provocative pronouncement:

There are basically two ways to know you have a future. One, you have some vocational training. Or two, you’re neurodivergent.

Source: https://fortune.com/2026/03/24/palantir-ceo-alex-karp-two-people-successful-in-ai-era-vocational-skills-neurodivergence-gen-z-career-advice/

The threat, it seems, is no longer even about whether you will be replaced, but rather: what work will AI leave for us humans? And with all the questions this raises about our future.

I do not have the answer, but I would like to avoid the binary opposition I mentioned earlier — between fascination with technology and a return to the Stone Age. A career change toward a non-automatable profession? That is like searching for a habitable place by 2050, for those of us who are not climate change deniers.

Regardless of the speaker — AI figures like Amodei, Mensch, Altman, or Thiel, from the most progressive to the most authoritarian — we remain within the same spectrum. We are not escaping liquid capitalism. In both cases, we reach the pinnacle of liberalism: making us entrepreneurs of ourselves. Ideologically, AI exalts the individual only to better bend them to the imperatives of profitability. This is AI’s incredible power of persuasion. In these narcissistic times, who can resist an augmented version of themselves?

I will stop my reflection here, but one element stands out across all of this discourse: symbolic violence, if not real violence. This return to a so-called state of nature, legitimized by a simple maxim — the strongest dominates the weakest. The struggle of each individual for their survival constantly endangers the lives of all. A truth against which no veneer of civilization should stand.

Every technological advance brings progress, but also violence. All progress stems from some form of violence. The discovery of the spear-thrower gave Homo sapiens the ability to dominate the natural world. AI is no exception. Is this the necessary violence that drives all technological progress forward?

More infos