Project: Build a RAG App with Claude — Query Your Own Documents

Claude's context window is large — up to 200,000 tokens — but it is not infinite. And it has a training cutoff. Your company's internal documentation, your proprietary research, your customer knowledge base — Claude does not know any of it. You could paste all of it into every prompt, but at scale that is impractical and expensive.
Retrieval-Augmented Generation, or RAG, is the solution. Instead of loading all your documents into Claude's context on every request, you index them in a vector database. When a question arrives, you retrieve only the most relevant chunks and include those in Claude's prompt. Claude generates an answer grounded in those retrieved passages, not in its general training knowledge.
This project builds a complete, functional RAG system: document ingestion, chunking, embedding, vector search, and grounded answer generation with Claude.
What is RAG and Why Does It Matter?
RAG architecture has three phases:
- Indexing: Split documents into chunks, convert each chunk to a vector embedding, and store in a vector database
- Retrieval: When a question arrives, embed the question, find the most semantically similar document chunks using vector similarity search
- Generation: Pass the retrieved chunks to Claude as context and ask Claude to answer the question based on that context
The result: Claude can answer questions accurately from private, current documents without hallucinating facts it does not know.
Prerequisites
- Python 3.9 or later
- pip install anthropic chromadb sentence-transformers pypdf2
- An Anthropic API key set as ANTHROPIC_API_KEY
ChromaDB is an open-source vector database that runs locally with no external service required. sentence-transformers provides the local embedding model.
Complete RAG Implementation
1import anthropic
2import chromadb
3from sentence_transformers import SentenceTransformer
4from pathlib import Path
5import pypdf
6import json
7import hashlib
8import re
9from typing import Optional
10
11# Initialise clients
12anthropic_client = anthropic.Anthropic()
13chroma_client = chromadb.PersistentClient(path="./chroma_db")
14embedder = SentenceTransformer("all-MiniLM-L6-v2") # Fast, accurate local model
15
16
17# ─── Document Ingestion ───────────────────────────────────────────────────────
18
19def extract_text_from_pdf(pdf_path: str) -> str:
20 """Extract text from a PDF file."""
21 reader = pypdf.PdfReader(pdf_path)
22 text = ""
23 for page in reader.pages:
24 text += page.extract_text() + "\n\n"
25 return text
26
27
28def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]:
29 """
30 Split text into overlapping chunks.
31
32 Args:
33 text: Full document text
34 chunk_size: Approximate words per chunk
35 overlap: Words of overlap between consecutive chunks
36 """
37 # Clean and normalise whitespace
38 text = re.sub(r'\s+', ' ', text).strip()
39 words = text.split()
40
41 chunks = []
42 start = 0
43
44 while start < len(words):
45 end = min(start + chunk_size, len(words))
46 chunk = " ".join(words[start:end])
47
48 # Try to end at a sentence boundary
49 if end < len(words):
50 last_period = chunk.rfind('. ')
51 if last_period > chunk_size * 0.6: # Only if not too short
52 chunk = chunk[:last_period + 1]
53
54 chunks.append(chunk)
55 start += chunk_size - overlap
56
57 return chunks
58
59
60def ingest_document(
61 file_path: str,
62 collection_name: str,
63 document_metadata: dict = None
64) -> int:
65 """
66 Ingest a document into the vector database.
67
68 Returns the number of chunks added.
69 """
70 path = Path(file_path)
71 metadata = document_metadata or {}
72
73 # Extract text
74 if path.suffix.lower() == ".pdf":
75 text = extract_text_from_pdf(file_path)
76 elif path.suffix.lower() in [".txt", ".md"]:
77 text = path.read_text(encoding="utf-8")
78 else:
79 raise ValueError(f"Unsupported file type: {path.suffix}")
80
81 # Split into chunks
82 chunks = chunk_text(text)
83 print(f" {path.name}: {len(chunks)} chunks from {len(text)} characters")
84
85 # Get or create collection
86 collection = chroma_client.get_or_create_collection(
87 name=collection_name,
88 metadata={"hnsw:space": "cosine"}
89 )
90
91 # Generate embeddings
92 embeddings = embedder.encode(chunks, batch_size=32, show_progress_bar=False)
93
94 # Prepare metadata for each chunk
95 doc_hash = hashlib.md5(path.name.encode()).hexdigest()[:8]
96
97 chunk_ids = [f"{doc_hash}_{i}" for i in range(len(chunks))]
98 chunk_metadata = [
99 {
100 **metadata,
101 "source": path.name,
102 "chunk_index": i,
103 "total_chunks": len(chunks)
104 }
105 for i in range(len(chunks))
106 ]
107
108 # Store in ChromaDB
109 collection.add(
110 ids=chunk_ids,
111 embeddings=embeddings.tolist(),
112 documents=chunks,
113 metadatas=chunk_metadata
114 )
115
116 return len(chunks)
117
118
119def ingest_directory(
120 directory: str,
121 collection_name: str,
122 extensions: list = None
123) -> int:
124 """Ingest all documents in a directory."""
125 exts = extensions or [".pdf", ".txt", ".md"]
126 total_chunks = 0
127
128 for path in Path(directory).iterdir():
129 if path.suffix.lower() in exts:
130 print(f"Ingesting: {path.name}")
131 chunks = ingest_document(str(path), collection_name)
132 total_chunks += chunks
133
134 return total_chunks
135
136
137# ─── Retrieval ────────────────────────────────────────────────────────────────
138
139def retrieve_relevant_chunks(
140 query: str,
141 collection_name: str,
142 n_results: int = 5,
143 min_relevance_score: float = 0.3
144) -> list[dict]:
145 """
146 Find the most relevant document chunks for a query.
147
148 Returns list of dicts with content, source, and relevance score.
149 """
150 collection = chroma_client.get_or_create_collection(collection_name)
151
152 if collection.count() == 0:
153 return []
154
155 # Embed query
156 query_embedding = embedder.encode([query])[0]
157
158 # Search
159 results = collection.query(
160 query_embeddings=[query_embedding.tolist()],
161 n_results=min(n_results, collection.count()),
162 include=["documents", "metadatas", "distances"]
163 )
164
165 # Convert distances to similarity scores (cosine distance to similarity)
166 chunks = []
167 for doc, metadata, distance in zip(
168 results["documents"][0],
169 results["metadatas"][0],
170 results["distances"][0]
171 ):
172 similarity = 1 - distance # Cosine similarity
173
174 if similarity >= min_relevance_score:
175 chunks.append({
176 "content": doc,
177 "source": metadata.get("source", "Unknown"),
178 "chunk_index": metadata.get("chunk_index", 0),
179 "similarity": round(similarity, 3)
180 })
181
182 # Sort by similarity (highest first)
183 chunks.sort(key=lambda x: x["similarity"], reverse=True)
184 return chunks
185
186
187# ─── Generation (RAG Answer) ─────────────────────────────────────────────────
188
189def answer_question(
190 question: str,
191 collection_name: str,
192 n_chunks: int = 5,
193 model: str = "claude-sonnet-4-6"
194) -> dict:
195 """
196 Answer a question using RAG.
197
198 Returns the answer and the source documents used.
199 """
200 # Retrieve relevant chunks
201 chunks = retrieve_relevant_chunks(question, collection_name, n_results=n_chunks)
202
203 if not chunks:
204 return {
205 "answer": "I could not find relevant information in the document library to answer this question.",
206 "sources": [],
207 "chunks_used": 0
208 }
209
210 # Build context block
211 context_sections = []
212 for i, chunk in enumerate(chunks, 1):
213 context_sections.append(
214 f"[Document: {chunk['source']}, Relevance: {chunk['similarity']}]\n{chunk['content']}"
215 )
216
217 context = "\n\n---\n\n".join(context_sections)
218
219 # Generate answer with Claude
220 response = anthropic_client.messages.create(
221 model=model,
222 max_tokens=2048,
223 system="""You are a helpful assistant that answers questions based strictly on provided document excerpts.
224
225Rules:
2261. Answer ONLY from the provided context below. Do not use knowledge from your training.
2272. If the context does not contain enough information to answer the question, say so clearly.
2283. Always cite which document(s) you used in your answer using [Source: filename] notation.
2294. If information from multiple documents is relevant, synthesise it clearly.
230""",
231 messages=[
232 {
233 "role": "user",
234 "content": f"""RETRIEVED DOCUMENT CONTEXT:
235{context}
236
237---
238
239QUESTION: {question}
240
241Answer the question based solely on the context above."""
242 }
243 ]
244 )
245
246 answer = response.content[0].text
247 unique_sources = list(set(chunk["source"] for chunk in chunks))
248
249 return {
250 "answer": answer,
251 "sources": unique_sources,
252 "chunks_used": len(chunks),
253 "retrieved_chunks": chunks
254 }
255
256
257# ─── Interactive RAG Interface ────────────────────────────────────────────────
258
259class RAGApp:
260 """Simple interactive RAG application."""
261
262 def __init__(self, collection_name: str):
263 self.collection_name = collection_name
264
265 def ingest(self, path: str):
266 """Ingest a file or directory."""
267 p = Path(path)
268 if p.is_dir():
269 count = ingest_directory(str(p), self.collection_name)
270 else:
271 count = ingest_document(str(p), self.collection_name)
272 print(f"Ingested {count} chunks total.")
273
274 def ask(self, question: str) -> str:
275 """Ask a question and get a grounded answer."""
276 result = answer_question(question, self.collection_name)
277
278 print(f"\nAnswer: {result['answer']}")
279 print(f"\nSources used: {', '.join(result['sources'])}")
280 print(f"Chunks retrieved: {result['chunks_used']}")
281
282 return result["answer"]
283
284 def run_interactive(self):
285 """Run an interactive Q&A session."""
286 print(f"\nRAG System ready. Collection: {self.collection_name}")
287 print("Type your question or 'quit' to exit.\n")
288
289 while True:
290 question = input("Question: ").strip()
291 if question.lower() in ("quit", "exit"):
292 break
293 if question:
294 self.ask(question)
295 print()
296
297
298# ─── Example Usage ────────────────────────────────────────────────────────────
299
300if __name__ == "__main__":
301 # Create RAG app for company documentation
302 app = RAGApp(collection_name="company_docs")
303
304 # Ingest documents
305 app.ingest("./documents/") # Point to your document folder
306
307 # Run interactive Q&A
308 app.run_interactive()Choose Chunk Size Based on Your Content
The chunk_size parameter (words per chunk) significantly affects RAG quality. For dense technical documentation, 300-400 words per chunk with 50-word overlap works well. For narrative text or long-form reports, 600-800 words per chunk may be more appropriate to preserve context. Too small, and chunks lack sufficient context for Claude to give complete answers. Too large, and retrieval precision drops because chunks contain too many topics.
Extending to Production
- Replace ChromaDB with a managed vector database like Pinecone, Weaviate, or pgvector for production scale and persistence
- Replace sentence-transformers with a higher-quality embedding model or Anthropic's own embeddings API when released
- Add re-ranking: After vector retrieval, use a cross-encoder model to re-rank chunks by relevance before passing to Claude — improves answer quality significantly
- Implement hybrid search: Combine vector similarity search with keyword BM25 search — hybrid search consistently outperforms either approach alone
- Add document versioning: Track document versions and re-ingest when documents are updated, removing old chunks and adding new ones
Summary
RAG is the most important architectural pattern for giving Claude accurate knowledge from private or current information. The three-stage pipeline — chunk, embed, generate — is straightforward to implement and scales from a personal knowledge base to enterprise document search.
- Chunk with overlap to preserve context at boundaries
- Use a local embedding model for cost-effective indexing — inference using Claude is not needed for embeddings
- Constrain Claude via the system prompt to answer only from provided context — prevents hallucination
- Cite sources in every answer — users need to know where information came from
Next IT pro project: Project: Build an AI-Powered IT Incident Report Generator.
This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a Multi-Language Translator App with Claude.
