This is a Production-Ready Retrieval-Augmented Generation (RAG) System built with enterprise-grade components:
- Framework: LangChain (RAG orchestration)
- Embeddings: Google Gemini API (semantic understanding)
- Vector DB: Pinecone (scalable vector storage)
- LLM: Google Gemini Pro (generation)
- UI: Streamlit (web interface)
┌─────────────────────────────────────────────────────────┐
│ User Interface │
│ (Streamlit Web App) │
│ - File Upload - Chat Interface - Display Results │
└────────────────────┬────────────────────────────────────┘
│
┌────────────────────▼────────────────────────────────────┐
│ RAG Processing Pipeline │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. Document Processor │
│ - Text Extraction (.txt, .pdf, .docx) │
│ - Text Chunking (with overlap) │
│ │
│ 2. Embedding Service │
│ - Generate vectors via Google Gemini │
│ - Dimension: 768 │
│ │
│ 3. Vector Storage │
│ - Pinecone index management │
│ - Metadata storage │
│ │
│ 4. RAG Chain │
│ - Retrieval (semantic search) │
│ - Generation (LLM response) │
│ - Prompt engineering │
│ │
└─────────────────────────────────────────────────────────┘
rag-project/
│
├── src/ # Main source code
│ ├── __init__.py # Package initialization
│ │
│ ├── config/ # Configuration module
│ │ ├── __init__.py
│ │ └── config.py # Centralized config (reads .env)
│ │
│ ├── rag/ # Core RAG implementation
│ │ ├── __init__.py
│ │ ├── pinecone_manager.py # Pinecone CRUD operations
│ │ ├── embedding_service.py # Google Gemini embeddings
│ │ ├── document_processor.py# Document pipeline
│ │ └── rag_chain.py # LangChain RAG chain
│ │
│ └── utils/ # Utility modules
│ ├── __init__.py
│ ├── helpers.py # Logging, formatting
│ ├── chunking.py # Text splitting logic
│ └── text_processor.py # File parsing
│
├── app.py # Streamlit web interface
├── main.py # CLI entry point
├── setup_project.py # Setup automation
├── requirements.txt # Python dependencies
│
├── .env.template # Config template
├── .env # Config (create from template)
│
└── README.md # User documentation
User Upload File
↓
Extract Text (text_processor.py)
↓
Split into Chunks (chunking.py)
↓
Generate Embeddings (embedding_service.py)
↓
Upsert to Pinecone (pinecone_manager.py)
↓
Document Ready for Queries
User Question (Chat Interface)
↓
Generate Question Embedding (embedding_service.py)
↓
Search Pinecone for Similar Chunks (pinecone_manager.py)
↓
Retrieve Top-K Results (default: 5)
↓
Format Context from Retrieved Chunks
↓
Send to Gemini with Custom Prompt (rag_chain.py)
↓
Stream Response to User
- Models Used:
models/embedding-001- Text embeddingsgemini-2.5-flash- Text generation
- Key Operations:
embed_content()- Generate embeddingsChatGoogleGenerativeAI()- LLM interface
- Index:
rag-documents-index(configurable) - Key Operations:
create_index()- Initialize vector databaseupsert()- Store vectors with metadataquery()- Semantic search
{
"id": "filename_0_a1b2c3d4",
"values": [0.23, 0.45, ...], // 768-dimensional embedding
"metadata": {
"chunk_index": 0,
"source": "document.txt",
"text": "First 500 characters of chunk..."
}
}chunk_index: Position in source documentsource: Original filenametext: Content preview (first 500 chars)
| Parameter | Default | Purpose |
|---|---|---|
CHUNK_SIZE |
1000 | Characters per chunk |
CHUNK_OVERLAP |
200 | Character overlap between chunks |
RETRIEVAL_TOP_K |
5 | Number of results to retrieve |
EMBEDDING_DIMENSION |
768 | Embedding vector dimension |
LANGCHAIN_VERBOSE |
False | Enable verbose logging |
-
Hallucination Prevention
- Custom prompt instructs model to refuse out-of-scope questions
- "I don't have information in the uploaded documents to answer that."
-
Context Verification
- Only uses retrieved documents as context
- No external data sources
-
Source Attribution
- Links answers back to source documents
- Shows document excerpts
-
Logging
- All operations logged for audit trail
- Configurable log levels
- Recursive character splitting
- Respects semantic boundaries (paragraphs, sentences)
- Configurable size and overlap
- Batch processing for multiple texts
- Caching ready (can be added)
- Async support ready
- Vector similarity search (cosine distance)
- Top-K filtering
- Metadata filtering support
-
New File Formats
# Add to text_processor.py elif file_ext == ".new_format": return TextProcessor._extract_from_new_format(file_path)
-
Custom Prompt Templates
# Modify in rag_chain.py _create_qa_chain() CUSTOM_PROMPT = PromptTemplate( template="Your custom template...", input_variables=["context", "question"] )
-
New Retrieval Strategies
# Create in rag_chain.py def retrieve_with_reranking(self, question: str): # Custom retrieval logic
# In .env file
LANGCHAIN_VERBOSE=True
LOG_LEVEL=DEBUGfrom src.rag import PineconeManager
pm = PineconeManager()
stats = pm.get_index_stats()
print(stats) # Shows vector counts, dimensions# Create test file
echo "Test content" > test.txt
# Process it
python main.py process test.txtfrom src.rag import RAGChain
chain = RAGChain()
result = chain.query("Test question?")
print(result["answer"])from src.rag import DocumentProcessor
processor = DocumentProcessor()
chunks = processor.process_file("document.txt", "document.txt")
print(f"Created {chunks} chunks")from src.rag import RAGChain
chain = RAGChain()
result = chain.query("What is the main topic?")
print(result["answer"])
for doc in result["source_documents"]:
print(f"Source: {doc.metadata['source']}")- Process new documents (appends to index)
- Use
--namespaceflag for isolation - Clear index if needed: Update PINECONE_INDEX_NAME in .env
- Streamlit UI: "Clear All Data" button
- CLI: Create new index with different name
- Small to medium document repositories (millions of vectors)
- Real-time query performance needs
- Multi-tenant support (via namespaces)
- Cost-effective vector storage
- Consider vector database partitioning
- Implement caching layer
- Add async batch processing
- Monitor Pinecone index size
| Issue | Cause | Solution |
|---|---|---|
| No embeddings generated | Invalid API key | Check GOOGLE_API_KEY |
| Connection refused | API timeout | Check internet, retry |
| Hallucinated answers | Prompt design | Adjust prompt template |
| Slow queries | Large TOP_K | Reduce RETRIEVAL_TOP_K |
| Memory issues | Large documents | Reduce CHUNK_SIZE |
Configclass with all settingsvalidate()method for config checks
EmbeddingServicefor generating vectorsembed_text()- single textembed_texts()- batch processing
PineconeManagerfor index operationscreate_index()- setupupsert_vectors()- storequery_vectors()- retrieve
DocumentProcessorfor full pipelineprocess_file()- single fileprocess_multiple_files()- batch
RAGChainfor Q&Aquery()- get answersis_relevant_to_documents()- check relevance
- Beginner: Use Streamlit UI only
- Intermediate: Explore CLI commands
- Advanced: Modify code and add features
- Expert: Integrate into production systems
- Google Generative AI: https://ai.google.dev/
- Pinecone Docs: https://docs.pinecone.io/
- LangChain: https://python.langchain.com/
- Streamlit: https://docs.streamlit.io/
Version: 1.0.0
Last Updated: December 2024
Status: Production Ready