Artificial IntelligenceAnthropicProjects

Project: Build a RAG App with Claude — Query Your Own Documents

TT
TopicTrick
Project: Build a RAG App with Claude — Query Your Own Documents

Claude's context window is large — up to 200,000 tokens — but it is not infinite. And it has a training cutoff. Your company's internal documentation, your proprietary research, your customer knowledge base — Claude does not know any of it. You could paste all of it into every prompt, but at scale that is impractical and expensive.

Retrieval-Augmented Generation, or RAG, is the solution. Instead of loading all your documents into Claude's context on every request, you index them in a vector database. When a question arrives, you retrieve only the most relevant chunks and include those in Claude's prompt. Claude generates an answer grounded in those retrieved passages, not in its general training knowledge.

This project builds a complete, functional RAG system: document ingestion, chunking, embedding, vector search, and grounded answer generation with Claude.


What is RAG and Why Does It Matter?

RAG architecture has three phases:

  1. Indexing: Split documents into chunks, convert each chunk to a vector embedding, and store in a vector database
  2. Retrieval: When a question arrives, embed the question, find the most semantically similar document chunks using vector similarity search
  3. Generation: Pass the retrieved chunks to Claude as context and ask Claude to answer the question based on that context

The result: Claude can answer questions accurately from private, current documents without hallucinating facts it does not know.


Prerequisites

  • Python 3.9 or later
  • pip install anthropic chromadb sentence-transformers pypdf2
  • An Anthropic API key set as ANTHROPIC_API_KEY

ChromaDB is an open-source vector database that runs locally with no external service required. sentence-transformers provides the local embedding model.


Complete RAG Implementation

python
1import anthropic 2import chromadb 3from sentence_transformers import SentenceTransformer 4from pathlib import Path 5import pypdf 6import json 7import hashlib 8import re 9from typing import Optional 10 11# Initialise clients 12anthropic_client = anthropic.Anthropic() 13chroma_client = chromadb.PersistentClient(path="./chroma_db") 14embedder = SentenceTransformer("all-MiniLM-L6-v2") # Fast, accurate local model 15 16 17# ─── Document Ingestion ─────────────────────────────────────────────────────── 18 19def extract_text_from_pdf(pdf_path: str) -> str: 20 """Extract text from a PDF file.""" 21 reader = pypdf.PdfReader(pdf_path) 22 text = "" 23 for page in reader.pages: 24 text += page.extract_text() + "\n\n" 25 return text 26 27 28def chunk_text(text: str, chunk_size: int = 500, overlap: int = 50) -> list[str]: 29 """ 30 Split text into overlapping chunks. 31 32 Args: 33 text: Full document text 34 chunk_size: Approximate words per chunk 35 overlap: Words of overlap between consecutive chunks 36 """ 37 # Clean and normalise whitespace 38 text = re.sub(r'\s+', ' ', text).strip() 39 words = text.split() 40 41 chunks = [] 42 start = 0 43 44 while start < len(words): 45 end = min(start + chunk_size, len(words)) 46 chunk = " ".join(words[start:end]) 47 48 # Try to end at a sentence boundary 49 if end < len(words): 50 last_period = chunk.rfind('. ') 51 if last_period > chunk_size * 0.6: # Only if not too short 52 chunk = chunk[:last_period + 1] 53 54 chunks.append(chunk) 55 start += chunk_size - overlap 56 57 return chunks 58 59 60def ingest_document( 61 file_path: str, 62 collection_name: str, 63 document_metadata: dict = None 64) -> int: 65 """ 66 Ingest a document into the vector database. 67 68 Returns the number of chunks added. 69 """ 70 path = Path(file_path) 71 metadata = document_metadata or {} 72 73 # Extract text 74 if path.suffix.lower() == ".pdf": 75 text = extract_text_from_pdf(file_path) 76 elif path.suffix.lower() in [".txt", ".md"]: 77 text = path.read_text(encoding="utf-8") 78 else: 79 raise ValueError(f"Unsupported file type: {path.suffix}") 80 81 # Split into chunks 82 chunks = chunk_text(text) 83 print(f" {path.name}: {len(chunks)} chunks from {len(text)} characters") 84 85 # Get or create collection 86 collection = chroma_client.get_or_create_collection( 87 name=collection_name, 88 metadata={"hnsw:space": "cosine"} 89 ) 90 91 # Generate embeddings 92 embeddings = embedder.encode(chunks, batch_size=32, show_progress_bar=False) 93 94 # Prepare metadata for each chunk 95 doc_hash = hashlib.md5(path.name.encode()).hexdigest()[:8] 96 97 chunk_ids = [f"{doc_hash}_{i}" for i in range(len(chunks))] 98 chunk_metadata = [ 99 { 100 **metadata, 101 "source": path.name, 102 "chunk_index": i, 103 "total_chunks": len(chunks) 104 } 105 for i in range(len(chunks)) 106 ] 107 108 # Store in ChromaDB 109 collection.add( 110 ids=chunk_ids, 111 embeddings=embeddings.tolist(), 112 documents=chunks, 113 metadatas=chunk_metadata 114 ) 115 116 return len(chunks) 117 118 119def ingest_directory( 120 directory: str, 121 collection_name: str, 122 extensions: list = None 123) -> int: 124 """Ingest all documents in a directory.""" 125 exts = extensions or [".pdf", ".txt", ".md"] 126 total_chunks = 0 127 128 for path in Path(directory).iterdir(): 129 if path.suffix.lower() in exts: 130 print(f"Ingesting: {path.name}") 131 chunks = ingest_document(str(path), collection_name) 132 total_chunks += chunks 133 134 return total_chunks 135 136 137# ─── Retrieval ──────────────────────────────────────────────────────────────── 138 139def retrieve_relevant_chunks( 140 query: str, 141 collection_name: str, 142 n_results: int = 5, 143 min_relevance_score: float = 0.3 144) -> list[dict]: 145 """ 146 Find the most relevant document chunks for a query. 147 148 Returns list of dicts with content, source, and relevance score. 149 """ 150 collection = chroma_client.get_or_create_collection(collection_name) 151 152 if collection.count() == 0: 153 return [] 154 155 # Embed query 156 query_embedding = embedder.encode([query])[0] 157 158 # Search 159 results = collection.query( 160 query_embeddings=[query_embedding.tolist()], 161 n_results=min(n_results, collection.count()), 162 include=["documents", "metadatas", "distances"] 163 ) 164 165 # Convert distances to similarity scores (cosine distance to similarity) 166 chunks = [] 167 for doc, metadata, distance in zip( 168 results["documents"][0], 169 results["metadatas"][0], 170 results["distances"][0] 171 ): 172 similarity = 1 - distance # Cosine similarity 173 174 if similarity >= min_relevance_score: 175 chunks.append({ 176 "content": doc, 177 "source": metadata.get("source", "Unknown"), 178 "chunk_index": metadata.get("chunk_index", 0), 179 "similarity": round(similarity, 3) 180 }) 181 182 # Sort by similarity (highest first) 183 chunks.sort(key=lambda x: x["similarity"], reverse=True) 184 return chunks 185 186 187# ─── Generation (RAG Answer) ───────────────────────────────────────────────── 188 189def answer_question( 190 question: str, 191 collection_name: str, 192 n_chunks: int = 5, 193 model: str = "claude-sonnet-4-6" 194) -> dict: 195 """ 196 Answer a question using RAG. 197 198 Returns the answer and the source documents used. 199 """ 200 # Retrieve relevant chunks 201 chunks = retrieve_relevant_chunks(question, collection_name, n_results=n_chunks) 202 203 if not chunks: 204 return { 205 "answer": "I could not find relevant information in the document library to answer this question.", 206 "sources": [], 207 "chunks_used": 0 208 } 209 210 # Build context block 211 context_sections = [] 212 for i, chunk in enumerate(chunks, 1): 213 context_sections.append( 214 f"[Document: {chunk['source']}, Relevance: {chunk['similarity']}]\n{chunk['content']}" 215 ) 216 217 context = "\n\n---\n\n".join(context_sections) 218 219 # Generate answer with Claude 220 response = anthropic_client.messages.create( 221 model=model, 222 max_tokens=2048, 223 system="""You are a helpful assistant that answers questions based strictly on provided document excerpts. 224 225Rules: 2261. Answer ONLY from the provided context below. Do not use knowledge from your training. 2272. If the context does not contain enough information to answer the question, say so clearly. 2283. Always cite which document(s) you used in your answer using [Source: filename] notation. 2294. If information from multiple documents is relevant, synthesise it clearly. 230""", 231 messages=[ 232 { 233 "role": "user", 234 "content": f"""RETRIEVED DOCUMENT CONTEXT: 235{context} 236 237--- 238 239QUESTION: {question} 240 241Answer the question based solely on the context above.""" 242 } 243 ] 244 ) 245 246 answer = response.content[0].text 247 unique_sources = list(set(chunk["source"] for chunk in chunks)) 248 249 return { 250 "answer": answer, 251 "sources": unique_sources, 252 "chunks_used": len(chunks), 253 "retrieved_chunks": chunks 254 } 255 256 257# ─── Interactive RAG Interface ──────────────────────────────────────────────── 258 259class RAGApp: 260 """Simple interactive RAG application.""" 261 262 def __init__(self, collection_name: str): 263 self.collection_name = collection_name 264 265 def ingest(self, path: str): 266 """Ingest a file or directory.""" 267 p = Path(path) 268 if p.is_dir(): 269 count = ingest_directory(str(p), self.collection_name) 270 else: 271 count = ingest_document(str(p), self.collection_name) 272 print(f"Ingested {count} chunks total.") 273 274 def ask(self, question: str) -> str: 275 """Ask a question and get a grounded answer.""" 276 result = answer_question(question, self.collection_name) 277 278 print(f"\nAnswer: {result['answer']}") 279 print(f"\nSources used: {', '.join(result['sources'])}") 280 print(f"Chunks retrieved: {result['chunks_used']}") 281 282 return result["answer"] 283 284 def run_interactive(self): 285 """Run an interactive Q&A session.""" 286 print(f"\nRAG System ready. Collection: {self.collection_name}") 287 print("Type your question or 'quit' to exit.\n") 288 289 while True: 290 question = input("Question: ").strip() 291 if question.lower() in ("quit", "exit"): 292 break 293 if question: 294 self.ask(question) 295 print() 296 297 298# ─── Example Usage ──────────────────────────────────────────────────────────── 299 300if __name__ == "__main__": 301 # Create RAG app for company documentation 302 app = RAGApp(collection_name="company_docs") 303 304 # Ingest documents 305 app.ingest("./documents/") # Point to your document folder 306 307 # Run interactive Q&A 308 app.run_interactive()

Choose Chunk Size Based on Your Content

The chunk_size parameter (words per chunk) significantly affects RAG quality. For dense technical documentation, 300-400 words per chunk with 50-word overlap works well. For narrative text or long-form reports, 600-800 words per chunk may be more appropriate to preserve context. Too small, and chunks lack sufficient context for Claude to give complete answers. Too large, and retrieval precision drops because chunks contain too many topics.


    Extending to Production

    • Replace ChromaDB with a managed vector database like Pinecone, Weaviate, or pgvector for production scale and persistence
    • Replace sentence-transformers with a higher-quality embedding model or Anthropic's own embeddings API when released
    • Add re-ranking: After vector retrieval, use a cross-encoder model to re-rank chunks by relevance before passing to Claude — improves answer quality significantly
    • Implement hybrid search: Combine vector similarity search with keyword BM25 search — hybrid search consistently outperforms either approach alone
    • Add document versioning: Track document versions and re-ingest when documents are updated, removing old chunks and adding new ones

    Summary

    RAG is the most important architectural pattern for giving Claude accurate knowledge from private or current information. The three-stage pipeline — chunk, embed, generate — is straightforward to implement and scales from a personal knowledge base to enterprise document search.

    • Chunk with overlap to preserve context at boundaries
    • Use a local embedding model for cost-effective indexing — inference using Claude is not needed for embeddings
    • Constrain Claude via the system prompt to answer only from provided context — prevents hallucination
    • Cite sources in every answer — users need to know where information came from

    Next IT pro project: Project: Build an AI-Powered IT Incident Report Generator.


    This post is part of the Anthropic AI Tutorial Series. Previous post: Project: Build a Multi-Language Translator App with Claude.